Hugging Face Daily Papers · May 27, 2026 · 3 min read

JLT: Clean-Latent Prediction in Latent Diffusion Transformers

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

This is the initial version of JLT. We will update it with complete experiments and results analysis and then revise it.</p>\n","updatedAt":"2026-05-27T14:47:12.563Z","author":{"_id":"65f1798a3e29a622f098a42d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65f1798a3e29a622f098a42d/IraNhB8kOIa2dSRKMJN3q.png","fullname":"Guanyu Zhou","name":"TheMartyr","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9234577417373657},"editors":["TheMartyr"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65f1798a3e29a622f098a42d/IraNhB8kOIa2dSRKMJN3q.png"],"reactions":[{"reaction":"🔥","users":["R3DeK","Mikufanssss","koujiaxin","yehogwon"],"count":4}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.27102","authors":[{"_id":"6a170341991d34bf203501e8","name":"Funing Fu","hidden":false},{"_id":"6a170341991d34bf203501e9","name":"Tenghui Wang","hidden":false},{"_id":"6a170341991d34bf203501ea","name":"Junyong Cen","hidden":false},{"_id":"6a170341991d34bf203501eb","name":"Qichao Zhu","hidden":false},{"_id":"6a170341991d34bf203501ec","name":"Guanyu Zhou","hidden":false}],"publishedAt":"2026-05-26T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"JLT: Clean-Latent Prediction in Latent Diffusion Transformers","submittedOnDailyBy":{"_id":"65f1798a3e29a622f098a42d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65f1798a3e29a622f098a42d/IraNhB8kOIa2dSRKMJN3q.png","isPro":false,"fullname":"Guanyu Zhou","user":"TheMartyr","type":"user","name":"TheMartyr"},"summary":"Flow matching with clean-data prediction has shown that regressing the clean point can exploit low-dimensional structure more effectively than predicting an ambient noised quantity. We ask whether this principle remains useful after images are mapped into a learned latent space, where compression has already removed much of the raw pixel variability. We introduce JLT, a 130M latent diffusion Transformer over frozen FLUX.2 VAE codes, and compare clean-latent prediction with a matched velocity-prediction DiT under the same representation, backbone, and training settings. Although the three variables x, epsilon, and v are linearly convertible for a fixed corruption time, a local Gaussian analysis shows that velocity regression inherits an isotropic target-covariance floor and amplifies low-variance latent directions, while clean prediction damps them. On ImageNet 256 x 256, JLT-B/1 obtains FID-50K 2.50 with classifier-free guidance, with a large matched-target gap over velocity prediction. These results suggest that prediction targets in latent diffusion are representation-dependent geometric choices, rather than interchangeable algebraic parameterizations.","upvotes":12,"discussionId":"6a170341991d34bf203501ed","projectPage":"https://github.com/akatsuki-neo/JLT","ai_summary":"Latent diffusion models using clean-data prediction outperform velocity prediction in compressed representations, demonstrating that prediction targets are geometrically dependent rather than algebraically interchangeable.","ai_keywords":["flow matching","clean-data prediction","latent space","diffusion models","velocity prediction","latent diffusion Transformer","FLUX.2 VAE","DiT","classifier-free guidance","FID-50K","Gaussian analysis","isotropic target-covariance floor"],"organization":{"_id":"69d903c090fc614ce4f7eb4b","name":"dawn-neo","fullname":" akatsuki-neo","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6747e1b1b48ed96c84b58830/3VGpT07W73b0tw8Jyyofo.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65f1798a3e29a622f098a42d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65f1798a3e29a622f098a42d/IraNhB8kOIa2dSRKMJN3q.png","isPro":false,"fullname":"Guanyu Zhou","user":"TheMartyr","type":"user"},{"_id":"6747e1b1b48ed96c84b58830","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6747e1b1b48ed96c84b58830/NqhcY1_GvL3GrjblNGDek.jpeg","isPro":false,"fullname":"spawner","user":"spawner1145","type":"user"},{"_id":"6964fbc2c4eb6cb0562e41b6","avatarUrl":"/avatars/e713c513082be1e405d684fe9ee07875.svg","isPro":false,"fullname":"123","user":"R3DeK","type":"user"},{"_id":"667027c822073d70a1438e4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/667027c822073d70a1438e4d/YRHqaKjERmXadIAbrBQ92.png","isPro":false,"fullname":"Displace_Asher","user":"Mikufanssss","type":"user"},{"_id":"65c4c99309672feb8ce0c883","avatarUrl":"/avatars/7444c1083dae17d671a09191d2400992.svg","isPro":false,"fullname":"xia","user":"xianienie","type":"user"},{"_id":"63bd1c420608fbc76b84049f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63bd1c420608fbc76b84049f/52JW0XtSHRrK7vm29Jv71.jpeg","isPro":false,"fullname":"Samwise Wang","user":"tzwm","type":"user"},{"_id":"637f08ab213876d4122a974c","avatarUrl":"/avatars/22f0115d1654d6fb8265d38e8629aa9c.svg","isPro":false,"fullname":"Jiaxin Kou","user":"koujiaxin","type":"user"},{"_id":"67053ebeb077981e97fe43c0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67053ebeb077981e97fe43c0/aRXze1Md2_hu93vVAdrMr.jpeg","isPro":false,"fullname":"linker","user":"whitelinker","type":"user"},{"_id":"68d40b135ae7555127aaac7f","avatarUrl":"/avatars/6e3cd7b39db1c12e8a249ee15af5a4a1.svg","isPro":false,"fullname":"starrystelle","user":"starrystelle","type":"user"},{"_id":"6430fffacd31d174a9fa96e4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6430fffacd31d174a9fa96e4/9ZnVcC9FOxCS8rNDr5Qwa.png","isPro":false,"fullname":"Nebulae","user":"NebulaeWis","type":"user"},{"_id":"69ccc52aaf6645491d17e317","avatarUrl":"/avatars/2ed46a892f982f96d82160b1d25696f2.svg","isPro":false,"fullname":"Yifan Ma","user":"chernandezzw","type":"user"},{"_id":"62e551b42277e8483be1c71c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659195792493-noauth.jpeg","isPro":false,"fullname":"Yeho Gwon","user":"yehogwon","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69d903c090fc614ce4f7eb4b","name":"dawn-neo","fullname":" akatsuki-neo","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6747e1b1b48ed96c84b58830/3VGpT07W73b0tw8Jyyofo.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.27102.md"}">

Papers

arxiv:2605.27102

JLT: Clean-Latent Prediction in Latent Diffusion Transformers

Published on May 26

· Submitted by

Guanyu Zhou on May 27

akatsuki-neo

Upvote

Authors:

Abstract

Latent diffusion models using clean-data prediction outperform velocity prediction in compressed representations, demonstrating that prediction targets are geometrically dependent rather than algebraically interchangeable.

AI-generated summary

Flow matching with clean-data prediction has shown that regressing the clean point can exploit low-dimensional structure more effectively than predicting an ambient noised quantity. We ask whether this principle remains useful after images are mapped into a learned latent space, where compression has already removed much of the raw pixel variability. We introduce JLT, a 130M latent diffusion Transformer over frozen FLUX.2 VAE codes, and compare clean-latent prediction with a matched velocity-prediction DiT under the same representation, backbone, and training settings. Although the three variables x, epsilon, and v are linearly convertible for a fixed corruption time, a local Gaussian analysis shows that velocity regression inherits an isotropic target-covariance floor and amplifies low-variance latent directions, while clean prediction damps them. On ImageNet 256 x 256, JLT-B/1 obtains FID-50K 2.50 with classifier-free guidance, with a large matched-target gap over velocity prediction. These results suggest that prediction targets in latent diffusion are representation-dependent geometric choices, rather than interchangeable algebraic parameterizations.

View arXiv page View PDF Project page Add to collection

Community

TheMartyr

Paper submitter about 10 hours ago

This is the initial version of JLT. We will update it with complete experiments and results analysis and then revise it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.27102

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.27102 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.27102 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.27102 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

JLT: Clean-Latent Prediction in Latent Diffusion Transformers

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers