Hugging Face Daily Papers · · 3 min read

PaintBench: Deterministic Evaluation of Precise Visual Editing

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

📄 paper: <a href=\"https://arxiv.org/abs/2606.00188\" rel=\"nofollow\">https://arxiv.org/abs/2606.00188</a><br>💻 code: <a href=\"https://github.com/PaintBench/PaintBench\" rel=\"nofollow\">https://github.com/PaintBench/PaintBench</a><br>🤗 dataset: <a href=\"https://hf.co/datasets/PaintBench/PaintBench\" rel=\"nofollow\">https://hf.co/datasets/PaintBench/PaintBench</a><br>🌐 website: <a href=\"https://paintbench.github.io\" rel=\"nofollow\">https://paintbench.github.io</a></p>\n","updatedAt":"2026-06-04T12:46:52.947Z","author":{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","fullname":"Ellis Brown","name":"ellisbrown","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":19,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7138500809669495},"editors":["ellisbrown"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg"],"reactions":[],"isReport":false}},{"id":"6a21d61b45017db1297100b0","author":{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","fullname":"Kai Xu","name":"kai-xu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-04T19:46:35.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"\nhttps://cdn-uploads.huggingface.co/production/uploads/66bbcb6d9dc887ecfbb998b2/p75XamYNxr0UA1UF1G3Ll.mp4\n","html":"<p><video src=\"https://cdn-uploads.huggingface.co/production/uploads/66bbcb6d9dc887ecfbb998b2/p75XamYNxr0UA1UF1G3Ll.mp4\" controls=\"\" class=\"max-w-full!\"></video></p>\n","updatedAt":"2026-06-04T19:46:35.246Z","author":{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","fullname":"Kai Xu","name":"kai-xu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3516802489757538},"editors":["kai-xu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.00188","authors":[{"_id":"6a1e41ea808ddbc3c7d43c1f","user":{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","isPro":false,"fullname":"Kai Xu","user":"kai-xu","type":"user","name":"kai-xu"},"name":"Kai Xu","status":"claimed_verified","statusLastChangedAt":"2026-06-04T12:43:43.669Z","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c20","user":{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","isPro":true,"fullname":"Ellis Brown","user":"ellisbrown","type":"user","name":"ellisbrown"},"name":"Ellis Brown","status":"claimed_verified","statusLastChangedAt":"2026-06-04T12:43:41.596Z","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c21","name":"Shrikar Madhu","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c22","name":"Rob Fergus","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c23","name":"He He","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c24","user":{"_id":"6596422646624a86ff3b3bda","avatarUrl":"/avatars/216e12b77e45ac5f1fa20932f5745411.svg","isPro":false,"fullname":"Saining Xie","user":"sainx","type":"user","name":"sainx"},"name":"Saining Xie","status":"claimed_verified","statusLastChangedAt":"2026-06-04T12:43:39.496Z","hidden":false}],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"PaintBench: Deterministic Evaluation of Precise Visual Editing","submittedOnDailyBy":{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","isPro":true,"fullname":"Ellis Brown","user":"ellisbrown","type":"user","name":"ellisbrown"},"summary":"While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations. Fine-grained benchmark diagnostics further show performance degradations induced by scene variations in object count, background complexity, color scheme, and edit-region size. To test generalization of PaintBench scores to applied task performance, we create a procedural, deterministic evaluation for data visualization editing (TinyGrafixBench) and find strong linear correlation with PaintBench scores (R^2 = 0.91, p < 0.001). Altogether, PaintBench provides a rigorous foundation for measuring and driving progress in precise multimodal visual editing.","upvotes":2,"discussionId":"6a1e41ea808ddbc3c7d43c25","projectPage":"https://paintbench.github.io","githubRepo":"https://github.com/PaintBench/PaintBench","githubRepoAddedBy":"user","ai_summary":"PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations.","ai_keywords":["multimodal models","visual editing","procedural generation","deterministic evaluation","task decomposition","fine-grained benchmark diagnostics","generalization","data visualization editing"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":2,"organization":{"_id":"662741612ada5b77e310d171","name":"nyu-visionx","fullname":"VISIONx @ NYU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/Kn-QtZjE6TJE-syTndXIW.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","isPro":false,"fullname":"Kai Xu","user":"kai-xu","type":"user"},{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","isPro":true,"fullname":"Ellis Brown","user":"ellisbrown","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"662741612ada5b77e310d171","name":"nyu-visionx","fullname":"VISIONx @ NYU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/Kn-QtZjE6TJE-syTndXIW.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.00188.md"}">
Papers
arxiv:2606.00188

PaintBench: Deterministic Evaluation of Precise Visual Editing

Published on May 29
· Submitted by
Ellis Brown
on Jun 4
Authors:
,
,
,

Abstract

PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations.

While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations. Fine-grained benchmark diagnostics further show performance degradations induced by scene variations in object count, background complexity, color scheme, and edit-region size. To test generalization of PaintBench scores to applied task performance, we create a procedural, deterministic evaluation for data visualization editing (TinyGrafixBench) and find strong linear correlation with PaintBench scores (R^2 = 0.91, p < 0.001). Altogether, PaintBench provides a rigorous foundation for measuring and driving progress in precise multimodal visual editing.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.00188
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.00188 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.00188 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers