Hugging Face Daily Papers · June 12, 2026 · 4 min read

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. With distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization, Z-Image Turbo++ substantially narrows the quality gap between 2-step and 8-step generation while keeping inference to only two denoising steps.</p>\n","updatedAt":"2026-06-12T06:28:13.218Z","author":{"_id":"646f1bef075e11ca78da3bb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg","fullname":"Dongyang Liu (Chris Liu)","name":"Cxxs","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":22,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8992097973823547},"editors":["Cxxs"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.12575","authors":[{"_id":"6a2ba6e34957fcdd3aac07a4","user":{"_id":"646f1bef075e11ca78da3bb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg","isPro":false,"fullname":"Dongyang Liu (Chris Liu)","user":"Cxxs","type":"user","name":"Cxxs"},"name":"Dongyang Liu","status":"claimed_verified","statusLastChangedAt":"2026-06-12T07:42:05.068Z","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07a5","name":"Ruoyi Du","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07a6","name":"David Liu","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07a7","user":{"_id":"662a0f2d4bab737c1a279843","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662a0f2d4bab737c1a279843/fC2p3mjMHkVpDQdEqkuR4.png","isPro":false,"fullname":"Dengyang Jiang","user":"DyJiang","type":"user","name":"DyJiang"},"name":"Dengyang Jiang","status":"claimed_verified","statusLastChangedAt":"2026-06-12T06:56:14.297Z","hidden":true},{"_id":"6a2ba6e34957fcdd3aac07a8","name":"Liangchen Li","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07a9","name":"Qilong Wu","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07aa","name":"Zhen Li","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07ab","name":"Steven C. H. Hoi","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07ac","name":"Hongsheng Li","hidden":false},{"_id":"6a2ba6e34957fcdd3aac07ad","name":"Peng Gao","hidden":false}],"publishedAt":"2026-06-10T00:00:00.000Z","submittedOnDailyAt":"2026-06-12T00:00:00.000Z","title":"High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation","submittedOnDailyBy":{"_id":"646f1bef075e11ca78da3bb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg","isPro":false,"fullname":"Dongyang Liu (Chris Liu)","user":"Cxxs","type":"user","name":"Cxxs"},"summary":"Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target. Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss. Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.","upvotes":5,"discussionId":"6a2ba6e34957fcdd3aac07ae","ai_summary":"A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization.","ai_keywords":["diffusion distillation","Z-Image Turbo++","distribution-aligned adversarial learning","step-decoupled parameterization","end-to-end training","iterative regularization","denoising steps","adversarial learning","model distillation"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6925b20fed452d1567c012d3","name":"Tongyi-MAI","fullname":"Tongyi-MAI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64379d79fac5ea753f1c10f3/fxHO6QoYjdv9_LTyiUD3g.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"646f1bef075e11ca78da3bb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646f1bef075e11ca78da3bb7/gNS-ikyZXYeMrf4a7HTQE.jpeg","isPro":false,"fullname":"Dongyang Liu (Chris Liu)","user":"Cxxs","type":"user"},{"_id":"662a0f2d4bab737c1a279843","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662a0f2d4bab737c1a279843/fC2p3mjMHkVpDQdEqkuR4.png","isPro":false,"fullname":"Dengyang Jiang","user":"DyJiang","type":"user"},{"_id":"6285a9133ab6642179158944","avatarUrl":"/avatars/6e10fa07c94141fcdbe0cab02bb731ca.svg","isPro":false,"fullname":"Zhen Li","user":"Paper99","type":"user"},{"_id":"69bceeb1b0b4d685f7c228c2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Dym6O8ZzdYODvZOkvHTKh.png","isPro":false,"fullname":"GAO Siyu","user":"zhu-jingyi8","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6925b20fed452d1567c012d3","name":"Tongyi-MAI","fullname":"Tongyi-MAI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64379d79fac5ea753f1c10f3/fxHO6QoYjdv9_LTyiUD3g.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.12575.md","query":{}}">

Papers

arxiv:2606.12575

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

Published on Jun 10

· Submitted by

Dongyang Liu (Chris Liu) on Jun 12

Tongyi-MAI

Upvote

Authors:

Dongyang Liu ,

Dengyang Jiang ,

Abstract

A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target. Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss. Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.

View arXiv page View PDF Add to collection

Community

Cxxs

Paper author Paper submitter about 4 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.12575

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.12575 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.12575 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.12575 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers