Hugging Face Daily Papers · · 3 min read

L2P: Unlocking Latent Potential for Pixel Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

An efficient transfer paradigm enabling high-quality, end-to-end pixel-space diffusion with minimal computational overhead (even on consumer GPUs) and data requirements.</p>\n<p>project page: <a href=\"https://nju-pcalab.github.io/projects/L2P/\" rel=\"nofollow\">https://nju-pcalab.github.io/projects/L2P/</a></p>\n","updatedAt":"2026-05-13T04:45:27.735Z","author":{"_id":"66449e619ff401732687f013","avatarUrl":"/avatars/251897d1324a70a9bf761513871c5841.svg","fullname":"chen","name":"zhen-nan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6212483644485474},"editors":["zhen-nan"],"editorAvatarUrls":["/avatars/251897d1324a70a9bf761513871c5841.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12013","authors":[{"_id":"6a04015e86b054ce2fa40f54","name":"Zhennan Chen","hidden":false},{"_id":"6a04015e86b054ce2fa40f55","name":"Junwei Zhu","hidden":false},{"_id":"6a04015e86b054ce2fa40f56","name":"Xu Chen","hidden":false},{"_id":"6a04015e86b054ce2fa40f57","name":"Jiangning Zhang","hidden":false},{"_id":"6a04015e86b054ce2fa40f58","name":"Jiawei Chen","hidden":false},{"_id":"6a04015e86b054ce2fa40f59","name":"Zhuoqi Zeng","hidden":false},{"_id":"6a04015e86b054ce2fa40f5a","name":"Wei Zhang","hidden":false},{"_id":"6a04015e86b054ce2fa40f5b","name":"Chengjie Wang","hidden":false},{"_id":"6a04015e86b054ce2fa40f5c","name":"Jian Yang","hidden":false},{"_id":"6a04015e86b054ce2fa40f5d","user":{"_id":"65734004769f3ee9bde1af10","avatarUrl":"/avatars/d6310ed861972fd691687d8f47413f33.svg","isPro":false,"fullname":"Ying Tai","user":"yingtai","type":"user","name":"yingtai"},"name":"Ying Tai","status":"claimed_verified","statusLastChangedAt":"2026-05-13T07:43:30.861Z","hidden":false}],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"L2P: Unlocking Latent Potential for Pixel Generation","submittedOnDailyBy":{"_id":"66449e619ff401732687f013","avatarUrl":"/avatars/251897d1324a70a9bf761513871c5841.svg","isPro":false,"fullname":"chen","user":"zhen-nan","type":"user","name":"zhen-nan"},"summary":"Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models. Specifically, L2P discards the VAE in favor of large-patch tokenization and freezes the source LDM's intermediate layers, exclusively training shallow layers to learn the latent-to-pixel transformation. By utilizing LDM-generated synthetic images as the sole training corpus, L2P fits an already smooth data manifold, enabling rapid convergence with zero real-data collection. This strategy allows L2P to seamlessly migrate massive latent priors to the pixel space using only 8 GPUs. Furthermore, eliminating the VAE memory bottleneck unlocks native 4K ultra-high resolution generation. Extensive experiments across mainstream LDM architectures show that L2P incurs negligible training overhead, yet performs on par with the source LDM on DPG-Bench and reaches 93% performance on GenEval.","upvotes":21,"discussionId":"6a04015e86b054ce2fa40f5e","projectPage":"https://nju-pcalab.github.io/projects/L2P/","githubRepo":"https://github.com/NJU-PCALab/L2P","githubRepoAddedBy":"user","ai_summary":"Latent-to-Pixel transfer paradigm efficiently leverages pre-trained latent diffusion models to create pixel-space models with minimal training overhead and high-resolution generation capabilities.","ai_keywords":["pixel diffusion models","latent-to-pixel transfer","latent diffusion models","large-patch tokenization","VAE","intermediate layers","shallow layers","synthetic images","data manifold","4K ultra-high resolution generation","DPG-Bench","GenEval"],"githubStars":24},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66449e619ff401732687f013","avatarUrl":"/avatars/251897d1324a70a9bf761513871c5841.svg","isPro":false,"fullname":"chen","user":"zhen-nan","type":"user"},{"_id":"65734004769f3ee9bde1af10","avatarUrl":"/avatars/d6310ed861972fd691687d8f47413f33.svg","isPro":false,"fullname":"Ying Tai","user":"yingtai","type":"user"},{"_id":"6691f7463e80c7fa549c1092","avatarUrl":"/avatars/a02b7b9722f6554695f4f210872c86f1.svg","isPro":true,"fullname":"Yinan Chen","user":"Coraxor","type":"user"},{"_id":"69be035dec98cfa04a3ab5eb","avatarUrl":"/avatars/dbbea06475e0c98062890cd8d19932df.svg","isPro":false,"fullname":"Haoyang","user":"Lewandofskee","type":"user"},{"_id":"65813fbeabafd960c84fdf2f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65813fbeabafd960c84fdf2f/-2YSyWi8M3hZC2V_Qcss5.png","isPro":false,"fullname":"Xiaokun Sun","user":"XiaokunSun","type":"user"},{"_id":"6900b31a56d66d46a493ef8a","avatarUrl":"/avatars/9114363bb322be02821c4efab41702b6.svg","isPro":false,"fullname":"Chen WuKong","user":"KSZG","type":"user"},{"_id":"5fd60d83843230460722a0e1","avatarUrl":"/avatars/693dfb531d87ccae88620f6a637dde23.svg","isPro":false,"fullname":"caixin","user":"caixin1998","type":"user"},{"_id":"6890778885aca9ad2931902f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/AmWO3HuxdMyqKiClSzusa.png","isPro":false,"fullname":"snoopy","user":"SnoopyGo","type":"user"},{"_id":"67244d60003a1dfcc5c78a37","avatarUrl":"/avatars/1382f39d99d2924e4357e147bf994e92.svg","isPro":false,"fullname":"ZYJ","user":"YOYOYOOO","type":"user"},{"_id":"667133649a8afbaa28202952","avatarUrl":"/avatars/b4d9ddd2d84236375010f27d87db7273.svg","isPro":false,"fullname":"Lin","user":"Riven520","type":"user"},{"_id":"6721f336ab7602a59650460a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/nvm_F_HEWFMgd97eUP7fV.png","isPro":false,"fullname":"jw","user":"chen6371","type":"user"},{"_id":"66d93c7999b05f95a7c93048","avatarUrl":"/avatars/22658f85c930797b171d5cd6e2de17bf.svg","isPro":false,"fullname":"medalwill","user":"medalwill","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12013.md"}">
Papers
arxiv:2605.12013

L2P: Unlocking Latent Potential for Pixel Generation

Published on May 12
· Submitted by
chen
on May 13
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Latent-to-Pixel transfer paradigm efficiently leverages pre-trained latent diffusion models to create pixel-space models with minimal training overhead and high-resolution generation capabilities.

AI-generated summary

Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models. Specifically, L2P discards the VAE in favor of large-patch tokenization and freezes the source LDM's intermediate layers, exclusively training shallow layers to learn the latent-to-pixel transformation. By utilizing LDM-generated synthetic images as the sole training corpus, L2P fits an already smooth data manifold, enabling rapid convergence with zero real-data collection. This strategy allows L2P to seamlessly migrate massive latent priors to the pixel space using only 8 GPUs. Furthermore, eliminating the VAE memory bottleneck unlocks native 4K ultra-high resolution generation. Extensive experiments across mainstream LDM architectures show that L2P incurs negligible training overhead, yet performs on par with the source LDM on DPG-Bench and reaches 93% performance on GenEval.

Community

Paper submitter about 16 hours ago

An efficient transfer paradigm enabling high-quality, end-to-end pixel-space diffusion with minimal computational overhead (even on consumer GPUs) and data requirements.

project page: https://nju-pcalab.github.io/projects/L2P/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.12013
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.12013 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.12013 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.12013 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers