JiT x0-prediction is not enough for pixel generation. AsymFlow introduces rank-asymmetric flow parameterization for scalable pixel generation.</p>\n<p><strong>Core Method</strong><br>Velocity prediction has a data term and a noise term. AsymFlow makes them rank-asymmetric:</p>\n<ul>\n<li>Data term is full-dimensional</li>\n<li>Noise term is in a low-rank subspace</li>\n</ul>\n<p>The full-dimensional velocity is recovered analytically for flow matching training and sampling.</p>\n<p><strong>State-of-the-Art Results</strong></p>\n<ul>\n<li>1.57 FID on ImageNet (best pixel flow model)</li>\n<li>Finetunes FLUX.2 klein into pixel space, beats the original latent model on HPSv3/DPG/GenEval (#1 overall on HPSv3)</li>\n</ul>\n<p><a href=\"https://cdn-uploads.huggingface.co/production/uploads/638067fcb334960c987fbeda/tCs6-krNomJW_oddv7QVw.jpeg\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/638067fcb334960c987fbeda/tCs6-krNomJW_oddv7QVw.jpeg\" alt=\"asymflow_teaser\"></a></p>\n","updatedAt":"2026-05-14T02:18:27.567Z","author":{"_id":"638067fcb334960c987fbeda","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638067fcb334960c987fbeda/63ZgNyCXQjQLhRd6poRK-.png","fullname":"Hansheng Chen","name":"Lakonik","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":41,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7590687870979309},"editors":["Lakonik"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/638067fcb334960c987fbeda/63ZgNyCXQjQLhRd6poRK-.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12964","authors":[{"_id":"6a052cbbb1a8cbabc9f08694","user":{"_id":"638067fcb334960c987fbeda","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638067fcb334960c987fbeda/63ZgNyCXQjQLhRd6poRK-.png","isPro":true,"fullname":"Hansheng Chen","user":"Lakonik","type":"user","name":"Lakonik"},"name":"Hansheng Chen","status":"claimed_verified","statusLastChangedAt":"2026-05-14T10:56:00.635Z","hidden":false},{"_id":"6a052cbbb1a8cbabc9f08695","user":{"_id":"66fb1165cb9763996575d9d2","avatarUrl":"/avatars/0ec7648321de9b3c073a766ed1130f7b.svg","isPro":false,"fullname":"Jan Ackermann","user":"ackermannj","type":"user","name":"ackermannj"},"name":"Jan Ackermann","status":"claimed_verified","statusLastChangedAt":"2026-05-14T10:55:58.786Z","hidden":false},{"_id":"6a052cbbb1a8cbabc9f08696","name":"Minseo Kim","hidden":false},{"_id":"6a052cbbb1a8cbabc9f08697","name":"Gordon Wetzstein","hidden":false},{"_id":"6a052cbbb1a8cbabc9f08698","name":"Leonidas Guibas","hidden":false}],"publishedAt":"2026-05-13T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"Asymmetric Flow Models","submittedOnDailyBy":{"_id":"638067fcb334960c987fbeda","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638067fcb334960c987fbeda/63ZgNyCXQjQLhRd6poRK-.png","isPro":true,"fullname":"Hansheng Chen","user":"Lakonik","type":"user","name":"Lakonik"},"summary":"Flow-based generation in high-dimensional spaces is difficult because velocity prediction requires modeling high-dimensional noise, even when data has strong low-rank structure. We present Asymmetric Flow Modeling (AsymFlow), a rank-asymmetric velocity parameterization that restricts noise prediction to a low-rank subspace while keeping data prediction full-dimensional. From this asymmetric prediction, AsymFlow analytically recovers the full-dimensional velocity without changing the network architecture or training/sampling procedures. On ImageNet 256times256, AsymFlow achieves a leading 1.57 FID, outperforming prior DiT/JiT-like pixel diffusion models by a large margin. AsymFlow also provides the first-ever route for finetuning pretrained latent flow models into pixel-space models: aligning the low-rank pixel subspace to the latent space gives a seamless initialization that preserves the latent model's high-level semantics and structure, so finetuning mainly improves low-level mismatches rather than relearning pixel generation. We show that the pixel AsymFlow model finetuned from FLUX.2 klein 9B establishes a new state of the art for pixel-space text-to-image generation, beating its latent base on HPSv3, DPG-Bench, and GenEval while qualitatively showing substantially improved visual realism.","upvotes":12,"discussionId":"6a052cbbb1a8cbabc9f08699","projectPage":"https://hanshengchen.com/asymflow/","githubRepo":"https://github.com/Lakonik/LakonLab","githubRepoAddedBy":"user","ai_summary":"Asymmetric Flow Modeling enables efficient high-dimensional flow-based generation by restricting noise prediction to low-rank subspaces while maintaining full-dimensional data prediction, achieving superior performance in pixel-space text-to-image generation through effective fine-tuning from latent models.","ai_keywords":["flow-based generation","velocity prediction","high-dimensional noise","low-rank structure","rank-asymmetric velocity parameterization","diffusion models","pixel space","latent space","fine-tuning","text-to-image generation","FID score","latent flow models","pixel diffusion models"],"githubStars":323,"organization":{"_id":"672c672dcf09d152f4da04c4","name":"StanfordUniversity","fullname":"Stanford University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/vJI0POlzGMXL2878t1vz2.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"638067fcb334960c987fbeda","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638067fcb334960c987fbeda/63ZgNyCXQjQLhRd6poRK-.png","isPro":true,"fullname":"Hansheng Chen","user":"Lakonik","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6900ecce911da714e743f38d","avatarUrl":"/avatars/066ff3195a2793535f2af2127f159692.svg","isPro":false,"fullname":"Howard Xiao","user":"howardhx","type":"user"},{"_id":"62f9558c77b722f1866448b8","avatarUrl":"/avatars/2a37d2caacab433fdd17d232aa29371c.svg","isPro":false,"fullname":"eepos","user":"eepos","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"63511e3aa8822aadf5720ff9","avatarUrl":"/avatars/5798938f1c90d642b2d492b67c77aa4c.svg","isPro":false,"fullname":"Matan Kleiner","user":"matankleiner","type":"user"},{"_id":"66615c855fd9d736e670e0a9","avatarUrl":"/avatars/0ff3127b513552432a7c651e21d7f283.svg","isPro":false,"fullname":"wangshuai","user":"wangsssssss","type":"user"},{"_id":"66fb1165cb9763996575d9d2","avatarUrl":"/avatars/0ec7648321de9b3c073a766ed1130f7b.svg","isPro":false,"fullname":"Jan Ackermann","user":"ackermannj","type":"user"},{"_id":"6351e5bb3734c6e8a5c1bec1","avatarUrl":"/avatars/a784a51b369b197398575c3afbd5ceab.svg","isPro":false,"fullname":"Han-Bit Kang","user":"hbkang","type":"user"},{"_id":"63c5d43ae2804cb2407e4d43","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1673909278097-noauth.png","isPro":false,"fullname":"xziayro","user":"xziayro","type":"user"},{"_id":"687b1491392477cd3f670a78","avatarUrl":"/avatars/7189730a0e210040536a007c07887292.svg","isPro":false,"fullname":"Hongje Seong","user":"hongjeseong","type":"user"},{"_id":"69ccaba55334e3f776f1f11c","avatarUrl":"/avatars/9a278175809d34bfa0a84d8719188fb3.svg","isPro":false,"fullname":"Li Jiahui","user":"HUSHml","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"672c672dcf09d152f4da04c4","name":"StanfordUniversity","fullname":"Stanford University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/vJI0POlzGMXL2878t1vz2.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12964.md"}">
Abstract
Asymmetric Flow Modeling enables efficient high-dimensional flow-based generation by restricting noise prediction to low-rank subspaces while maintaining full-dimensional data prediction, achieving superior performance in pixel-space text-to-image generation through effective fine-tuning from latent models.
AI-generated summary
Flow-based generation in high-dimensional spaces is difficult because velocity prediction requires modeling high-dimensional noise, even when data has strong low-rank structure. We present Asymmetric Flow Modeling (AsymFlow), a rank-asymmetric velocity parameterization that restricts noise prediction to a low-rank subspace while keeping data prediction full-dimensional. From this asymmetric prediction, AsymFlow analytically recovers the full-dimensional velocity without changing the network architecture or training/sampling procedures. On ImageNet 256times256, AsymFlow achieves a leading 1.57 FID, outperforming prior DiT/JiT-like pixel diffusion models by a large margin. AsymFlow also provides the first-ever route for finetuning pretrained latent flow models into pixel-space models: aligning the low-rank pixel subspace to the latent space gives a seamless initialization that preserves the latent model's high-level semantics and structure, so finetuning mainly improves low-level mismatches rather than relearning pixel generation. We show that the pixel AsymFlow model finetuned from FLUX.2 klein 9B establishes a new state of the art for pixel-space text-to-image generation, beating its latent base on HPSv3, DPG-Bench, and GenEval while qualitatively showing substantially improved visual realism.
Community
JiT x0-prediction is not enough for pixel generation. AsymFlow introduces rank-asymmetric flow parameterization for scalable pixel generation.
Core Method
Velocity prediction has a data term and a noise term. AsymFlow makes them rank-asymmetric:
- Data term is full-dimensional
- Noise term is in a low-rank subspace
The full-dimensional velocity is recovered analytically for flow matching training and sampling.
State-of-the-Art Results
- 1.57 FID on ImageNet (best pixel flow model)
- Finetunes FLUX.2 klein into pixel space, beats the original latent model on HPSv3/DPG/GenEval (#1 overall on HPSv3)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.12964 in a dataset README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.