Hugging Face Daily Papers · June 4, 2026 · 3 min read

PaintBench: Deterministic Evaluation of Precise Visual Editing

#model-release #benchmark #funding #robotics

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

📄 paper: <a href=\"https://arxiv.org/abs/2606.00188\" rel=\"nofollow\">https://arxiv.org/abs/2606.00188</a> 💻 code: <a href=\"https://github.com/PaintBench/PaintBench\" rel=\"nofollow\">https://github.com/PaintBench/PaintBench</a> 🤗 dataset: <a href=\"https://hf.co/datasets/PaintBench/PaintBench\" rel=\"nofollow\">https://hf.co/datasets/PaintBench/PaintBench</a> 🌐 website: <a href=\"https://paintbench.github.io\" rel=\"nofollow\">https://paintbench.github.io</a>\n","updatedAt":"2026-06-04T12:46:52.947Z","author":{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","fullname":"Ellis Brown","name":"ellisbrown","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":19,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7138500809669495},"editors":["ellisbrown"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg"],"reactions":[],"isReport":false}},{"id":"6a21d61b45017db1297100b0","author":{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","fullname":"Kai Xu","name":"kai-xu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-04T19:46:35.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"\nhttps://cdn-uploads.huggingface.co/production/uploads/66bbcb6d9dc887ecfbb998b2/p75XamYNxr0UA1UF1G3Ll.mp4\n","html":"<video src=\"https://cdn-uploads.huggingface.co/production/uploads/66bbcb6d9dc887ecfbb998b2/p75XamYNxr0UA1UF1G3Ll.mp4\" controls=\"\" class=\"max-w-full!\"></video>\n","updatedAt":"2026-06-04T19:46:35.246Z","author":{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","fullname":"Kai Xu","name":"kai-xu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3516802489757538},"editors":["kai-xu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.00188","authors":[{"_id":"6a1e41ea808ddbc3c7d43c1f","user":{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","isPro":false,"fullname":"Kai Xu","user":"kai-xu","type":"user","name":"kai-xu"},"name":"Kai Xu","status":"claimed_verified","statusLastChangedAt":"2026-06-04T12:43:43.669Z","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c20","user":{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","isPro":true,"fullname":"Ellis Brown","user":"ellisbrown","type":"user","name":"ellisbrown"},"name":"Ellis Brown","status":"claimed_verified","statusLastChangedAt":"2026-06-04T12:43:41.596Z","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c21","name":"Shrikar Madhu","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c22","name":"Rob Fergus","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c23","name":"He He","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c24","user":{"_id":"6596422646624a86ff3b3bda","avatarUrl":"/avatars/216e12b77e45ac5f1fa20932f5745411.svg","isPro":false,"fullname":"Saining Xie","user":"sainx","type":"user","name":"sainx"},"name":"Saining Xie","status":"claimed_verified","statusLastChangedAt":"2026-06-04T12:43:39.496Z","hidden":false}],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"PaintBench: Deterministic Evaluation of Precise Visual Editing","submittedOnDailyBy":{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","isPro":true,"fullname":"Ellis Brown","user":"ellisbrown","type":"user","name":"ellisbrown"},"summary":"While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations. Fine-grained benchmark diagnostics further show performance degradations induced by scene variations in object count, background complexity, color scheme, and edit-region size. To test generalization of PaintBench scores to applied task performance, we create a procedural, deterministic evaluation for data visualization editing (TinyGrafixBench) and find strong linear correlation with PaintBench scores (R^2 = 0.91, p < 0.001). Altogether, PaintBench provides a rigorous foundation for measuring and driving progress in precise multimodal visual editing.","upvotes":2,"discussionId":"6a1e41ea808ddbc3c7d43c25","projectPage":"https://paintbench.github.io","githubRepo":"https://github.com/PaintBench/PaintBench","githubRepoAddedBy":"user","ai_summary":"PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations.","ai_keywords":["multimodal models","visual editing","procedural generation","deterministic evaluation","task decomposition","fine-grained benchmark diagnostics","generalization","data visualization editing"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":2,"organization":{"_id":"662741612ada5b77e310d171","name":"nyu-visionx","fullname":"VISIONx @ NYU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/Kn-QtZjE6TJE-syTndXIW.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","isPro":false,"fullname":"Kai Xu","user":"kai-xu","type":"user"},{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","isPro":true,"fullname":"Ellis Brown","user":"ellisbrown","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"662741612ada5b77e310d171","name":"nyu-visionx","fullname":"VISIONx @ NYU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/Kn-QtZjE6TJE-syTndXIW.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.00188.md"}">

Papers

arxiv:2606.00188

PaintBench: Deterministic Evaluation of Precise Visual Editing

Published on May 29

· Submitted by

Ellis Brown on Jun 4

VISIONx @ NYU

Upvote

Authors:

Kai Xu ,

Ellis Brown ,

Saining Xie

Abstract

PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations. Fine-grained benchmark diagnostics further show performance degradations induced by scene variations in object count, background complexity, color scheme, and edit-region size. To test generalization of PaintBench scores to applied task performance, we create a procedural, deterministic evaluation for data visualization editing (TinyGrafixBench) and find strong linear correlation with PaintBench scores (R^2 = 0.91, p < 0.001). Altogether, PaintBench provides a rigorous foundation for measuring and driving progress in precise multimodal visual editing.

View arXiv page View PDF Project page GitHub 2 Add to collection

Community

ellisbrown

Paper author Paper submitter about 13 hours ago

📄 paper: https://arxiv.org/abs/2606.00188
💻 code: https://github.com/PaintBench/PaintBench
🤗 dataset: https://hf.co/datasets/PaintBench/PaintBench
🌐 website: https://paintbench.github.io

kai-xu

Paper author about 6 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.00188

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.00188 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.00188 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

PaintBench: Deterministic Evaluation of Precise Visual Editing

Abstract

Community

Models citing this paper 0

Datasets citing this paper 1

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers