Hugging Face Daily Papers · · 4 min read

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

We introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. With distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization, Z-Image Turbo++ substantially narrows the quality gap between 2-step and 8-step generation while keeping inference to only two denoising steps.</p>\n","updatedAt":"2026-06-12T06:28:13.218Z","author":{"_id":"646f1bef075e11ca78da3bb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg","fullname":"Dongyang Liu (Chris Liu)","name":"Cxxs","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":22,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8992097973823547},"editors":["Cxxs"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.12575","authors":[{"_id":"6a2ba6e34957fcdd3aac07a4","user":{"_id":"646f1bef075e11ca78da3bb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg","isPro":false,"fullname":"Dongyang Liu (Chris Liu)","user":"Cxxs","type":"user","name":"Cxxs"},"name":"Dongyang Liu","status":"claimed_verified","statusLastChangedAt":"2026-06-12T07:42:05.068Z","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07a5","name":"Ruoyi Du","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07a6","name":"David Liu","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07a7","user":{"_id":"662a0f2d4bab737c1a279843","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662a0f2d4bab737c1a279843/fC2p3mjMHkVpDQdEqkuR4.png","isPro":false,"fullname":"Dengyang Jiang","user":"DyJiang","type":"user","name":"DyJiang"},"name":"Dengyang Jiang","status":"claimed_verified","statusLastChangedAt":"2026-06-12T06:56:14.297Z","hidden":true},{"_id":"6a2ba6e34957fcdd3aac07a8","name":"Liangchen Li","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07a9","name":"Qilong Wu","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07aa","name":"Zhen Li","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07ab","name":"Steven C. H. Hoi","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07ac","name":"Hongsheng Li","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07ad","name":"Peng Gao","hidden":false}],"publishedAt":"2026-06-10T00:00:00.000Z","submittedOnDailyAt":"2026-06-12T00:00:00.000Z","title":"High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation","submittedOnDailyBy":{"_id":"646f1bef075e11ca78da3bb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg","isPro":false,"fullname":"Dongyang Liu (Chris Liu)","user":"Cxxs","type":"user","name":"Cxxs"},"summary":"Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target. Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss. Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.","upvotes":5,"discussionId":"6a2ba6e34957fcdd3aac07ae","ai_summary":"A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization.","ai_keywords":["diffusion distillation","Z-Image Turbo++","distribution-aligned adversarial learning","step-decoupled parameterization","end-to-end training","iterative regularization","denoising steps","adversarial learning","model distillation"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6925b20fed452d1567c012d3","name":"Tongyi-MAI","fullname":"Tongyi-MAI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64379d79fac5ea753f1c10f3/fxHO6QoYjdv9_LTyiUD3g.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"646f1bef075e11ca78da3bb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg","isPro":false,"fullname":"Dongyang Liu (Chris Liu)","user":"Cxxs","type":"user"},{"_id":"662a0f2d4bab737c1a279843","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662a0f2d4bab737c1a279843/fC2p3mjMHkVpDQdEqkuR4.png","isPro":false,"fullname":"Dengyang Jiang","user":"DyJiang","type":"user"},{"_id":"6285a9133ab6642179158944","avatarUrl":"/avatars/6e10fa07c94141fcdbe0cab02bb731ca.svg","isPro":false,"fullname":"Zhen Li","user":"Paper99","type":"user"},{"_id":"69bceeb1b0b4d685f7c228c2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Dym6O8ZzdYODvZOkvHTKh.png","isPro":false,"fullname":"GAO Siyu","user":"zhu-jingyi8","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6925b20fed452d1567c012d3","name":"Tongyi-MAI","fullname":"Tongyi-MAI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64379d79fac5ea753f1c10f3/fxHO6QoYjdv9_LTyiUD3g.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.12575.md","query":{}}">
Papers
arxiv:2606.12575

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

Published on Jun 10
· Submitted by
Dongyang Liu (Chris Liu)
on Jun 12
Authors:
,
,
,
,
,
,
,

Abstract

A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization.

Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target. Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss. Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.

Community

Paper author Paper submitter about 4 hours ago

We introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. With distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization, Z-Image Turbo++ substantially narrows the quality gap between 2-step and 8-step generation while keeping inference to only two denoising steps.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.12575
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.12575 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.12575 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.12575 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers