Hugging Face Daily Papers · May 26, 2026 · 5 min read

Channel-wise Vector Quantization

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise tokens. Unlike conventional vector quantization, which assigns a discrete token to each patch feature vector, CVQ quantizes each channel of the feature map. This formulation represents an image as discrete levels of visual details, rather than as a grid of spatial patches. Based on CVQ, we introduce a new visual autoregressive framework with \"next-channel prediction\". Instead of rendering images patch by patch in raster order, our Channel-wise Autoregressive (CAR) model predicts image channels sequentially, producing progressively enriched visual details. Specifically, it first sketches global structure and then refines fine-grained attributes, akin to a human artist's workflow. Empirically, we show that: (1) CVQ achieves 100% codebook utilization with a 16K+ codebook size without any bells and whistles, and substantially improves reconstruction quality over conventional VQ; and (2) CAR attains a DPG score of 86.7 and a GenEval score of 0.79, demonstrating strong effectiveness for text-to-image generation.</p>\n","updatedAt":"2026-05-26T03:21:32.825Z","author":{"_id":"665eccf5ffd59344a22533a8","avatarUrl":"/avatars/2ae2710753ce34a04937384bc6dddf70.svg","fullname":"Wei Song (SII)","name":"Songweii","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8652878999710083},"editors":["Songweii"],"editorAvatarUrls":["/avatars/2ae2710753ce34a04937384bc6dddf70.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.26089","authors":[{"_id":"6a1510dbb57a1823d5708ad4","user":{"_id":"665eccf5ffd59344a22533a8","avatarUrl":"/avatars/2ae2710753ce34a04937384bc6dddf70.svg","isPro":false,"fullname":"Wei Song (SII)","user":"Songweii","type":"user","name":"Songweii"},"name":"Wei Song","status":"claimed_verified","statusLastChangedAt":"2026-05-26T07:09:13.056Z","hidden":false},{"_id":"6a1510dbb57a1823d5708ad5","name":"Tianhang Wang","hidden":false},{"_id":"6a1510dbb57a1823d5708ad6","user":{"_id":"64651db3611ae99d14d392ea","avatarUrl":"/avatars/b818dc0dddc999758ab5737d5053e8c3.svg","isPro":false,"fullname":"YitongChen (SII)","user":"Row11n","type":"user","name":"Row11n"},"name":"Yitong Chen","status":"claimed_verified","statusLastChangedAt":"2026-05-26T07:09:10.714Z","hidden":false},{"_id":"6a1510dbb57a1823d5708ad7","name":"Tong Zhang","hidden":false},{"_id":"6a1510dbb57a1823d5708ad8","name":"Zuxuan Wu","hidden":false},{"_id":"6a1510dbb57a1823d5708ad9","name":"Ming Li","hidden":false},{"_id":"6a1510dbb57a1823d5708ada","name":"Jiaqi Wang","hidden":false},{"_id":"6a1510dbb57a1823d5708adb","name":"Kaicheng Yu","hidden":false}],"publishedAt":"2026-05-25T00:00:00.000Z","submittedOnDailyAt":"2026-05-26T00:00:00.000Z","title":"Channel-wise Vector Quantization","submittedOnDailyBy":{"_id":"665eccf5ffd59344a22533a8","avatarUrl":"/avatars/2ae2710753ce34a04937384bc6dddf70.svg","isPro":false,"fullname":"Wei Song (SII)","user":"Songweii","type":"user","name":"Songweii"},"summary":"We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise tokens. Unlike conventional vector quantization, which assigns a discrete token to each patch feature vector, CVQ quantizes each channel of the feature map. This formulation represents an image as discrete levels of visual details, rather than as a grid of spatial patches. Based on CVQ, we introduce a new visual autoregressive framework with \"next-channel prediction\". Instead of rendering images patch by patch in raster order, our Channel-wise Autoregressive (CAR) model predicts image channels sequentially, producing progressively enriched visual details. Specifically, it first sketches global structure and then refines fine-grained attributes, akin to a human artist's workflow. Empirically, we show that: (1) CVQ achieves 100% codebook utilization with a 16K+ codebook size without any bells and whistles, and substantially improves reconstruction quality over conventional VQ; and (2) CAR attains a DPG score of 86.7 and a GenEval score of 0.79, demonstrating strong effectiveness for text-to-image generation.","upvotes":5,"discussionId":"6a1510dbb57a1823d5708adc","projectPage":"https://github.com/songweii/CVQ","githubRepo":"https://github.com/songweii/CVQ","githubRepoAddedBy":"user","ai_summary":"Channel-wise Vector Quantization replaces patch-wise tokens with channel-wise tokens in image tokenization, enabling a next-channel prediction framework that generates images by sequentially refining visual details.","ai_keywords":["Channel-wise Vector Quantization","image tokenization","vector quantization","patch-wise tokens","channel-wise tokens","visual autoregressive framework","next-channel prediction","Channel-wise Autoregressive","codebook utilization","reconstruction quality","DPG score","GenEval score","text-to-image generation"],"githubStars":1},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"665eccf5ffd59344a22533a8","avatarUrl":"/avatars/2ae2710753ce34a04937384bc6dddf70.svg","isPro":false,"fullname":"Wei Song (SII)","user":"Songweii","type":"user"},{"_id":"673c7319d11b1c2e246ead9c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/673c7319d11b1c2e246ead9c/IjFIO--N7Hm_BOEafhEQv.jpeg","isPro":false,"fullname":"Yang Shi","user":"DogNeverSleep","type":"user"},{"_id":"624862b4a460a8870c9d6a48","avatarUrl":"/avatars/479bc415ee624528e910f22bdb344b23.svg","isPro":false,"fullname":"Tianhang Wang (SII)","user":"tianhang-wang","type":"user"},{"_id":"64651db3611ae99d14d392ea","avatarUrl":"/avatars/b818dc0dddc999758ab5737d5053e8c3.svg","isPro":false,"fullname":"YitongChen (SII)","user":"Row11n","type":"user"},{"_id":"634ec067aae4bde2c8dfc86f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/634ec067aae4bde2c8dfc86f/OQBLKcspofUqAzmEpvH0-.png","isPro":false,"fullname":"Yamata Zen","user":"yamatazen","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.26089.md"}">

Papers

arxiv:2605.26089

Channel-wise Vector Quantization

Published on May 25

· Submitted by

Wei Song (SII) on May 26

Upvote

Authors:

Wei Song ,

Yitong Chen ,

Abstract

Channel-wise Vector Quantization replaces patch-wise tokens with channel-wise tokens in image tokenization, enabling a next-channel prediction framework that generates images by sequentially refining visual details.

AI-generated summary

View arXiv page View PDF Project page GitHub 1 Add to collection

Community

Songweii

Paper author Paper submitter about 5 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.26089

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.26089 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.26089 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.26089 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

Channel-wise Vector Quantization

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers