Editor's Choice</p>\n","updatedAt":"2026-05-20T15:30:42.264Z","author":{"_id":"623f4bd2e801e8c1e59d948e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648315730065-623f4bd2e801e8c1e59d948e.jpeg","fullname":"Mor Ventura","name":"MorVentura","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3460330069065094},"editors":["MorVentura"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1648315730065-623f4bd2e801e8c1e59d948e.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14842","authors":[{"_id":"6a0dc63dd1ef9ecdf71c0dc6","name":"Mor Ventura","hidden":false},{"_id":"6a0dc63dd1ef9ecdf71c0dc7","name":"Roy Hirsch","hidden":false},{"_id":"6a0dc63dd1ef9ecdf71c0dc8","name":"Yonatan Bitton","hidden":false},{"_id":"6a0dc63dd1ef9ecdf71c0dc9","name":"Regev Cohen","hidden":false},{"_id":"6a0dc63dd1ef9ecdf71c0dca","name":"Roi Reichart","hidden":false}],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-20T00:00:00.000Z","title":"Editor's Choice: Evaluating Abstract Intent in Image Editing through Atomic Entity Analysis","submittedOnDailyBy":{"_id":"623f4bd2e801e8c1e59d948e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648315730065-623f4bd2e801e8c1e59d948e.jpeg","isPro":false,"fullname":"Mor Ventura","user":"MorVentura","type":"user","name":"MorVentura"},"summary":"Humans naturally communicate through abstract concepts like \"mood\". However, current image editing benchmarks focus primarily on explicit, literal commands, leaving abstract instructions largely underexplored. In this work, we first formalize the definition and taxonomy of abstract image editing. To measure instruction-following in this challenging domain, we introduce Entity-Rubrics, a framework that breaks down abstract edits into individual, entity-level assessments and achieves strong correlation with human judgment. Alongside this framework, we contribute AbstractEdit, the first benchmark dedicated to abstract image editing across diverse real-world scenes. Evaluating 11 leading models on this dataset reveals a fundamental challenge: standard architectures struggle to balance intent and preservation, commonly defaulting to under-editing or over-editing. Our analysis demonstrates that driving meaningful improvements relies heavily on integrating advanced LLM text encoders and iterative thinking. Looking forward, our entity-based paradigm can generalize beyond assessment to serve as a reward model, enable models to correctly interpret abstract communication, or highlight specific failures in test-time critique loops. Ultimately, we hope this work serves as a stepping stone toward seamless multimodal interaction, closing the gap between rigid machine execution and the natural, open-ended way humans communicate.","upvotes":1,"discussionId":"6a0dc63dd1ef9ecdf71c0dcb","projectPage":"https://venturamor.github.io/EditorsChoice/","ai_summary":"Abstract image editing benchmark and entity-rubrics framework reveal challenges in balancing intent and preservation for abstract instructions, highlighting need for advanced LLM integration and iterative approaches.","ai_keywords":["abstract image editing","instruction-following","Entity-Rubrics","benchmark","text encoders","iterative thinking","multimodal interaction"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"623f4bd2e801e8c1e59d948e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648315730065-623f4bd2e801e8c1e59d948e.jpeg","isPro":false,"fullname":"Mor Ventura","user":"MorVentura","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0}">
Editor's Choice: Evaluating Abstract Intent in Image Editing through Atomic Entity Analysis
Abstract
Abstract image editing benchmark and entity-rubrics framework reveal challenges in balancing intent and preservation for abstract instructions, highlighting need for advanced LLM integration and iterative approaches.
AI-generated summary
Humans naturally communicate through abstract concepts like "mood". However, current image editing benchmarks focus primarily on explicit, literal commands, leaving abstract instructions largely underexplored. In this work, we first formalize the definition and taxonomy of abstract image editing. To measure instruction-following in this challenging domain, we introduce Entity-Rubrics, a framework that breaks down abstract edits into individual, entity-level assessments and achieves strong correlation with human judgment. Alongside this framework, we contribute AbstractEdit, the first benchmark dedicated to abstract image editing across diverse real-world scenes. Evaluating 11 leading models on this dataset reveals a fundamental challenge: standard architectures struggle to balance intent and preservation, commonly defaulting to under-editing or over-editing. Our analysis demonstrates that driving meaningful improvements relies heavily on integrating advanced LLM text encoders and iterative thinking. Looking forward, our entity-based paradigm can generalize beyond assessment to serve as a reward model, enable models to correctly interpret abstract communication, or highlight specific failures in test-time critique loops. Ultimately, we hope this work serves as a stepping stone toward seamless multimodal interaction, closing the gap between rigid machine execution and the natural, open-ended way humans communicate.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.14842 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.14842 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.14842 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.