Hugging Face Daily Papers · · 4 min read

Q-ARVD: Quantizing Autoregressive Video Diffusion Models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Q-ARVD proposes the first quantization framework tailored for autoregressive video diffusion models. It introduces a final-quality guided frame-weighting mechanism to handle the unbalanced frame-wise quantization sensitivity, and an outlier-aware adaptive dual-scale strategy to address the heterogeneous outlier patterns.</p>\n","updatedAt":"2026-05-22T05:34:30.513Z","author":{"_id":"6a0c4875512e8cf10c427be1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6a0c4875512e8cf10c427be1/eMm3o7aXACXkRf2XQ_oEy.jpeg","fullname":"Siao Tang","name":"ttu1818","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.781085729598999},"editors":["ttu1818"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6a0c4875512e8cf10c427be1/eMm3o7aXACXkRf2XQ_oEy.jpeg"],"reactions":[],"isReport":false}},{"id":"6a0feb58553b3f071b67c8ab","author":{"_id":"6a0c4875512e8cf10c427be1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6a0c4875512e8cf10c427be1/eMm3o7aXACXkRf2XQ_oEy.jpeg","fullname":"Siao Tang","name":"ttu1818","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-05-22T05:36:24.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Paper: https://arxiv.org/abs/2605.21072\nCode: https://github.com/tsa18/Q-ARVD","html":"<p>Paper: <a href=\"https://arxiv.org/abs/2605.21072\" rel=\"nofollow\">https://arxiv.org/abs/2605.21072</a><br>Code: <a href=\"https://github.com/tsa18/Q-ARVD\" rel=\"nofollow\">https://github.com/tsa18/Q-ARVD</a></p>\n","updatedAt":"2026-05-22T05:36:24.365Z","author":{"_id":"6a0c4875512e8cf10c427be1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6a0c4875512e8cf10c427be1/eMm3o7aXACXkRf2XQ_oEy.jpeg","fullname":"Siao Tang","name":"ttu1818","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6462653279304504},"editors":["ttu1818"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6a0c4875512e8cf10c427be1/eMm3o7aXACXkRf2XQ_oEy.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.21072","authors":[{"_id":"6a0e8a6f164dbbc68a26c5e4","name":"Siao Tang","hidden":false},{"_id":"6a0e8a6f164dbbc68a26c5e5","name":"Xinyin Ma","hidden":false},{"_id":"6a0e8a6f164dbbc68a26c5e6","name":"Gongfan Fang","hidden":false},{"_id":"6a0e8a6f164dbbc68a26c5e7","name":"Xingyi Yang","hidden":false},{"_id":"6a0e8a6f164dbbc68a26c5e8","name":"Xinchao Wang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6a0c4875512e8cf10c427be1/hPPFO2vcNNr2tR8d5BBXN.mp4"],"publishedAt":"2026-05-20T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"Q-ARVD: Quantizing Autoregressive Video Diffusion Models","submittedOnDailyBy":{"_id":"6a0c4875512e8cf10c427be1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6a0c4875512e8cf10c427be1/eMm3o7aXACXkRf2XQ_oEy.jpeg","isPro":false,"fullname":"Siao Tang","user":"ttu1818","type":"user","name":"ttu1818"},"summary":"Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.","upvotes":13,"discussionId":"6a0e8a6f164dbbc68a26c5e9","githubRepo":"https://github.com/tsa18/Q-ARVD","githubRepoAddedBy":"user","ai_summary":"Autoregressive video diffusion models face high inference costs that limit practical deployment, prompting the development of Q-ARVD, a novel quantization framework addressing frame-wise sensitivity imbalance and weight outlier patterns specific to these models.","ai_keywords":["autoregressive video diffusion models","quantization","diffusion transformers","frame-wise quantization sensitivity","error accumulation","exponential decay pattern","weight distributions","outlier channels","adaptive dual-scale quantization","final-quality aware frame-weighting"],"githubStars":9,"organization":{"_id":"6508ab2b349930913196378b","name":"NationalUniversityofSingapore","fullname":"National University of Singapore","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/630ca0817dacb93b33506ce7/ZYUmpSMsa5Whihw3me2Bw.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6a0c4875512e8cf10c427be1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6a0c4875512e8cf10c427be1/eMm3o7aXACXkRf2XQ_oEy.jpeg","isPro":false,"fullname":"Siao Tang","user":"ttu1818","type":"user"},{"_id":"646a1939c37ca1e12308fe81","avatarUrl":"/avatars/752e9d86018e7d33ad8bcd741203fd86.svg","isPro":false,"fullname":"Gongfan Fang","user":"Vinnnf","type":"user"},{"_id":"69ccf8f85f1e1b0f7a0de324","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/WY2wHDPJ3ijyH2HOdErR4.png","isPro":false,"fullname":"高橋颯太","user":"jacksonwright70","type":"user"},{"_id":"634cfebc350bcee9bed20a4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/634cfebc350bcee9bed20a4d/fN47nN5rhw-HJaFLBZWQy.png","isPro":false,"fullname":"Xingyi Yang","user":"adamdad","type":"user"},{"_id":"677fbbf5f2e19477cb809830","avatarUrl":"/avatars/51af04f28038870f3ec418cc4909ecd0.svg","isPro":false,"fullname":"Tianbo Pan","user":"pan7386","type":"user"},{"_id":"64396ebc21221ac7411852b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64396ebc21221ac7411852b3/SR0dC8N0bdj9tZFxYPpSf.jpeg","isPro":false,"fullname":"Xinyin Ma","user":"horseee","type":"user"},{"_id":"65811eeaa2284a018e51f1ba","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/dH8UZj6Kk5HJkI1DItCNm.jpeg","isPro":false,"fullname":"Zigeng Chen","user":"Zigeng","type":"user"},{"_id":"689cb792f522165a63e55e4f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/689cb792f522165a63e55e4f/LIQv_bkx7rqZLax8CAuyV.jpeg","isPro":false,"fullname":"Haiquan Lu","user":"haiquanlu","type":"user"},{"_id":"640ebdfefdeaae139086f4d8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/640ebdfefdeaae139086f4d8/2N94gbHubplYD8njmUTPf.jpeg","isPro":true,"fullname":"Zhenxiong Tan","user":"Yuanshi","type":"user"},{"_id":"6706ab1168e9971e91bad6f7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/tWSXpBEAm0d8gTDWFRxTS.png","isPro":false,"fullname":"LIQIIIII","user":"LIQIIIII","type":"user"},{"_id":"67a4a26d5e65aa63c6d30e68","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67a4a26d5e65aa63c6d30e68/GtodlJGw-_IL2DTXQTucz.jpeg","isPro":false,"fullname":"Sicheng Feng","user":"FSCCS","type":"user"},{"_id":"5df833bdda6d0311fd3d5403","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df833bdda6d0311fd3d5403/62OtGJEQXdOuhV9yCd4HS.png","isPro":false,"fullname":"Weihao Yu","user":"whyu","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6508ab2b349930913196378b","name":"NationalUniversityofSingapore","fullname":"National University of Singapore","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/630ca0817dacb93b33506ce7/ZYUmpSMsa5Whihw3me2Bw.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.21072.md"}">
Papers
arxiv:2605.21072

Q-ARVD: Quantizing Autoregressive Video Diffusion Models

Published on May 20
· Submitted by
Siao Tang
on May 22
Authors:
,
,
,
,

Abstract

Autoregressive video diffusion models face high inference costs that limit practical deployment, prompting the development of Q-ARVD, a novel quantization framework addressing frame-wise sensitivity imbalance and weight outlier patterns specific to these models.

AI-generated summary

Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.

Community

Paper submitter about 6 hours ago

Q-ARVD proposes the first quantization framework tailored for autoregressive video diffusion models. It introduces a final-quality guided frame-weighting mechanism to handle the unbalanced frame-wise quantization sensitivity, and an outlier-aware adaptive dual-scale strategy to address the heterogeneous outlier patterns.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.21072
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.21072 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.21072 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.21072 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers