Hugging Face Daily Papers · · 5 min read

GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model’s internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that can selfevolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, a self-evolving framework based on Tool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired by onpolicy self-distillation, Visual Experience Distillation provides dense token-level supervision, helping the student internalize better search, knowledge activation, reference selection, and prompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks.</p>\n","updatedAt":"2026-05-22T06:24:59.012Z","author":{"_id":"64966691990b342dcc9fccb5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64966691990b342dcc9fccb5/tQSrE3MkBeakk5QYfgHSo.jpeg","fullname":"sixiang chen","name":"Ephemeral182","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.85947185754776},"editors":["Ephemeral182"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64966691990b342dcc9fccb5/tQSrE3MkBeakk5QYfgHSo.jpeg"],"reactions":[{"reaction":"👍","users":["hujunyao","Ephemeral182"],"count":2}],"isReport":false}},{"id":"6a100bb5c14677fc10580fd9","author":{"_id":"62dbeaf3d36b2070f922747f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1671954059773-62dbeaf3d36b2070f922747f.jpeg","fullname":"Junyao Hu","name":"hujunyao","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-05-22T07:54:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"nice work","html":"<p>nice work</p>\n","updatedAt":"2026-05-22T07:54:29.565Z","author":{"_id":"62dbeaf3d36b2070f922747f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1671954059773-62dbeaf3d36b2070f922747f.jpeg","fullname":"Junyao Hu","name":"hujunyao","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9323122501373291},"editors":["hujunyao"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1671954059773-62dbeaf3d36b2070f922747f.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.21605","authors":[{"_id":"6a0fb45fa53a61ce2e422be0","name":"Sixiang Chen","hidden":false},{"_id":"6a0fb45fa53a61ce2e422be1","name":"Zhaohu Xing","hidden":false},{"_id":"6a0fb45fa53a61ce2e422be2","name":"Tian Ye","hidden":false},{"_id":"6a0fb45fa53a61ce2e422be3","name":"Xinyu Geng","hidden":false},{"_id":"6a0fb45fa53a61ce2e422be4","name":"Yunlong Lin","hidden":false},{"_id":"6a0fb45fa53a61ce2e422be5","name":"Jianyu Lai","hidden":false},{"_id":"6a0fb45fa53a61ce2e422be6","name":"Xuanhua He","hidden":false},{"_id":"6a0fb45fa53a61ce2e422be7","name":"Fuxiang Zhai","hidden":false},{"_id":"6a0fb45fa53a61ce2e422be8","name":"Jialin Gao","hidden":false},{"_id":"6a0fb45fa53a61ce2e422be9","name":"Lei Zhu","hidden":false}],"publishedAt":"2026-05-20T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation","submittedOnDailyBy":{"_id":"64966691990b342dcc9fccb5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64966691990b342dcc9fccb5/tQSrE3MkBeakk5QYfgHSo.jpeg","isPro":false,"fullname":"sixiang chen","user":"Ephemeral182","type":"user","name":"Ephemeral182"},"summary":"Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model's internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that can self-evolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, a self-evolving framework based on Tool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired by on-policy self-distillation, Visual Experience Distillation provides dense token-level supervision, helping the student internalize better search, knowledge activation, reference selection, and prompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. Our website is as follows: https://ephemeral182.github.io/GenEvolve/","upvotes":9,"discussionId":"6a0fb45fa53a61ce2e422bea","projectPage":"https://ephemeral182.github.io/GenEvolve/","githubRepo":"https://github.com/MeiGen-AI/GenEvolve","githubRepoAddedBy":"user","ai_summary":"A self-evolving image generation framework uses tool-orchestrated trajectories and visual experience distillation to improve generative capabilities through iterative learning and reference-based prompting.","ai_keywords":["tool-orchestrated visual experience distillation","visual experience distillation","on-policy self-distillation","image-generation agent","tool-orchestrated trajectory","reference selection","prompt construction","visual experience distillation","self-evolving framework"],"githubStars":5,"organization":{"_id":"6846844f2a2242306b48fd00","name":"MeiGen-AI","fullname":"MeiGen-AI","avatar":"https://www.gravatar.com/avatar/12cae87d0b62e071d8ea49e12c7b0a9c?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64966691990b342dcc9fccb5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64966691990b342dcc9fccb5/tQSrE3MkBeakk5QYfgHSo.jpeg","isPro":false,"fullname":"sixiang chen","user":"Ephemeral182","type":"user"},{"_id":"6997cedb873bf51305db813a","avatarUrl":"/avatars/8d2ae415f92baf57ae075eb98992d06d.svg","isPro":false,"fullname":"Xd3emwyo","user":"xd3emwyo","type":"user"},{"_id":"66015e8aa4d296af07de538e","avatarUrl":"/avatars/a1295c631cc2646282c545859975ce4c.svg","isPro":false,"fullname":"Owen","user":"Owen777","type":"user"},{"_id":"62dbeaf3d36b2070f922747f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1671954059773-62dbeaf3d36b2070f922747f.jpeg","isPro":false,"fullname":"Junyao Hu","user":"hujunyao","type":"user"},{"_id":"637db2dbb61b6d662af326e8","avatarUrl":"/avatars/6d543ec2847755e95943041f695634b9.svg","isPro":false,"fullname":"Shuzhou Yang","user":"Ysz2022","type":"user"},{"_id":"67136093d2e50f1e8c9fad52","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0q49MyGuav8lJ9CIeyLhu.png","isPro":false,"fullname":"Donghao Zhou","user":"donghao-zhou","type":"user"},{"_id":"66915f24f4ac69749d45781f","avatarUrl":"/avatars/80f4da9dad1c38583ccf538c988247e8.svg","isPro":false,"fullname":"ads","user":"sxcasf","type":"user"},{"_id":"644a196f1af2bfd37ae0769f","avatarUrl":"/avatars/7c7829343c6d5e366de1f95e722afba6.svg","isPro":false,"fullname":"amineleung","user":"matrixgame","type":"user"},{"_id":"67f87bc19d597ac661a75b68","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67f87bc19d597ac661a75b68/ARLLbu1CJ5mCQu6ptyfmG.jpeg","isPro":false,"fullname":"Zhuoran Zhao","user":"Alicezrzhao","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6846844f2a2242306b48fd00","name":"MeiGen-AI","fullname":"MeiGen-AI","avatar":"https://www.gravatar.com/avatar/12cae87d0b62e071d8ea49e12c7b0a9c?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.21605.md"}">
Papers
arxiv:2605.21605

GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Published on May 20
· Submitted by
sixiang chen
on May 22
Authors:
,
,
,
,
,
,
,
,
,

Abstract

A self-evolving image generation framework uses tool-orchestrated trajectories and visual experience distillation to improve generative capabilities through iterative learning and reference-based prompting.

AI-generated summary

Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model's internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that can self-evolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, a self-evolving framework based on Tool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired by on-policy self-distillation, Visual Experience Distillation provides dense token-level supervision, helping the student internalize better search, knowledge activation, reference selection, and prompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. Our website is as follows: https://ephemeral182.github.io/GenEvolve/

Community

Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model’s internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that can selfevolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, a self-evolving framework based on Tool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired by onpolicy self-distillation, Visual Experience Distillation provides dense token-level supervision, helping the student internalize better search, knowledge activation, reference selection, and prompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.21605
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.21605 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers