Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.</p>\n","updatedAt":"2026-06-04T02:12:30.511Z","author":{"_id":"655de51982afda0fc479fb91","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655de51982afda0fc479fb91/-t9RLNEBAESO0niQGHoss.png","fullname":"Tianhe Wu","name":"TianheWu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.937179684638977},"editors":["TianheWu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/655de51982afda0fc479fb91/-t9RLNEBAESO0niQGHoss.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03746","authors":[{"_id":"6a202b5d15100c5272a84148","name":"Tianhe Wu","hidden":false},{"_id":"6a202b5d15100c5272a84149","name":"Kun Yan","hidden":false},{"_id":"6a202b5d15100c5272a8414a","name":"Zikai Zhou","hidden":false},{"_id":"6a202b5d15100c5272a8414b","name":"Lihan Jiang","hidden":false},{"_id":"6a202b5d15100c5272a8414c","name":"Jiahao Li","hidden":false},{"_id":"6a202b5d15100c5272a8414d","name":"Jie Zhang","hidden":false},{"_id":"6a202b5d15100c5272a8414e","name":"Kaiyuan Gao","hidden":false},{"_id":"6a202b5d15100c5272a8414f","name":"Ningyuan Tang","hidden":false},{"_id":"6a202b5d15100c5272a84150","name":"Shengming Yin","hidden":false},{"_id":"6a202b5d15100c5272a84151","name":"Xiaoyue Chen","hidden":false},{"_id":"6a202b5d15100c5272a84152","name":"Xiao Xu","hidden":false},{"_id":"6a202b5d15100c5272a84153","name":"Yilei Chen","hidden":false},{"_id":"6a202b5d15100c5272a84154","name":"Yuxiang Chen","hidden":false},{"_id":"6a202b5d15100c5272a84155","name":"Yan Shu","hidden":false},{"_id":"6a202b5d15100c5272a84156","name":"Yixian Xu","hidden":false},{"_id":"6a202b5d15100c5272a84157","name":"Yanran Zhang","hidden":false},{"_id":"6a202b5d15100c5272a84158","name":"Zihao Liu","hidden":false},{"_id":"6a202b5d15100c5272a84159","name":"Zhendong Wang","hidden":false},{"_id":"6a202b5d15100c5272a8415a","name":"Zekai Zhang","hidden":false},{"_id":"6a202b5d15100c5272a8415b","name":"Deqing Li","hidden":false},{"_id":"6a202b5d15100c5272a8415c","name":"Liang Peng","hidden":false},{"_id":"6a202b5d15100c5272a8415d","name":"Yi Wang","hidden":false},{"_id":"6a202b5d15100c5272a8415e","name":"Jingren Zhou","hidden":false},{"_id":"6a202b5d15100c5272a8415f","name":"Chenfei Wu","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"Qwen-Image-Flash: Beyond Objective Design","submittedOnDailyBy":{"_id":"655de51982afda0fc479fb91","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655de51982afda0fc479fb91/-t9RLNEBAESO0niQGHoss.png","isPro":false,"fullname":"Tianhe Wu","user":"TianheWu","type":"user","name":"TianheWu"},"summary":"Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.","upvotes":20,"discussionId":"6a202b5d15100c5272a8416b","ai_summary":"Few-step distillation for visual generative models benefits from systematic investigation of training recipes beyond just distillation objectives, leading to improved student performance through optimized data composition, teacher guidance, and task mixture.","ai_keywords":["few-step distillation","visual generative models","text-to-image generation","instruction-guided image editing","distillation objectives","training recipe","data composition","teacher guidance","task mixture"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"635a3e7ed6fabe6bee43f150","avatarUrl":"/avatars/f392d13a613e16939900f3a4e57c53c7.svg","isPro":false,"fullname":"Wang","user":"Zhendong","type":"user"},{"_id":"655de51982afda0fc479fb91","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655de51982afda0fc479fb91/-t9RLNEBAESO0niQGHoss.png","isPro":false,"fullname":"Tianhe Wu","user":"TianheWu","type":"user"},{"_id":"65f5dc345f9b537bfb125988","avatarUrl":"/avatars/7fa9de162694d34a214ccd8ecb02fa0a.svg","isPro":false,"fullname":"Sergey Zubrilin","user":"hiauiarau","type":"user"},{"_id":"64a4ce8118f4e2529546daef","avatarUrl":"/avatars/6d88aa68eccfa07d2009df405f957fd7.svg","isPro":false,"fullname":"Jiang Lihan","user":"lhjiang","type":"user"},{"_id":"64b0a5037a475fba70a7260d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b0a5037a475fba70a7260d/MauBbb6raMA23yrR1Zq21.jpeg","isPro":false,"fullname":"Zhen Fang","user":"CostaliyA","type":"user"},{"_id":"651f8133dbf879b8c58f5136","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/651f8133dbf879b8c58f5136/0L8Ecgi5Ietkm_DchJwE-.png","isPro":false,"fullname":"Zikai Zhou","user":"Klayand","type":"user"},{"_id":"6310fc1464939fabc00b8df2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6310fc1464939fabc00b8df2/TxFAW1A2vpx7myZItFdXo.png","isPro":true,"fullname":"trevor","user":"TrevorJS","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"63c5d43ae2804cb2407e4d43","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1673909278097-noauth.png","isPro":false,"fullname":"xziayro","user":"xziayro","type":"user"},{"_id":"645db15ff4f49de580a10269","avatarUrl":"/avatars/ea1bdd7a478f4c4a7b3e134c4330ec78.svg","isPro":false,"fullname":"snowflakewang","user":"SnowflakeWang","type":"user"},{"_id":"659cb6cc38186a51f122689e","avatarUrl":"/avatars/11c33c81e87f55091b672c64f7c743d3.svg","isPro":false,"fullname":"Park JuHoon","user":"J4BEZ","type":"user"},{"_id":"66935bdc5489e4f73c76bc7b","avatarUrl":"/avatars/129d1e86bbaf764b507501f4feb177db.svg","isPro":false,"fullname":"Abidoye Aanuoluwapo","user":"Aanuoluwapo65","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03746.md"}">
Qwen-Image-Flash: Beyond Objective Design
Authors: ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
Few-step distillation for visual generative models benefits from systematic investigation of training recipes beyond just distillation objectives, leading to improved student performance through optimized data composition, teacher guidance, and task mixture.
Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.
Community
Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.03746 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.03746 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.03746 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.