📄 paper: <a href=\"https://arxiv.org/abs/2606.00188\" rel=\"nofollow\">https://arxiv.org/abs/2606.00188</a><br>💻 code: <a href=\"https://github.com/PaintBench/PaintBench\" rel=\"nofollow\">https://github.com/PaintBench/PaintBench</a><br>🤗 dataset: <a href=\"https://hf.co/datasets/PaintBench/PaintBench\" rel=\"nofollow\">https://hf.co/datasets/PaintBench/PaintBench</a><br>🌐 website: <a href=\"https://paintbench.github.io\" rel=\"nofollow\">https://paintbench.github.io</a></p>\n","updatedAt":"2026-06-04T12:46:52.947Z","author":{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","fullname":"Ellis Brown","name":"ellisbrown","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":19,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7138500809669495},"editors":["ellisbrown"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg"],"reactions":[],"isReport":false}},{"id":"6a21d61b45017db1297100b0","author":{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","fullname":"Kai Xu","name":"kai-xu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-04T19:46:35.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"\nhttps://cdn-uploads.huggingface.co/production/uploads/66bbcb6d9dc887ecfbb998b2/p75XamYNxr0UA1UF1G3Ll.mp4\n","html":"<p><video src=\"https://cdn-uploads.huggingface.co/production/uploads/66bbcb6d9dc887ecfbb998b2/p75XamYNxr0UA1UF1G3Ll.mp4\" controls=\"\" class=\"max-w-full!\"></video></p>\n","updatedAt":"2026-06-04T19:46:35.246Z","author":{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","fullname":"Kai Xu","name":"kai-xu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3516802489757538},"editors":["kai-xu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.00188","authors":[{"_id":"6a1e41ea808ddbc3c7d43c1f","user":{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","isPro":false,"fullname":"Kai Xu","user":"kai-xu","type":"user","name":"kai-xu"},"name":"Kai Xu","status":"claimed_verified","statusLastChangedAt":"2026-06-04T12:43:43.669Z","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c20","user":{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","isPro":true,"fullname":"Ellis Brown","user":"ellisbrown","type":"user","name":"ellisbrown"},"name":"Ellis Brown","status":"claimed_verified","statusLastChangedAt":"2026-06-04T12:43:41.596Z","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c21","name":"Shrikar Madhu","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c22","name":"Rob Fergus","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c23","name":"He He","hidden":false},{"_id":"6a1e41ea808ddbc3c7d43c24","user":{"_id":"6596422646624a86ff3b3bda","avatarUrl":"/avatars/216e12b77e45ac5f1fa20932f5745411.svg","isPro":false,"fullname":"Saining Xie","user":"sainx","type":"user","name":"sainx"},"name":"Saining Xie","status":"claimed_verified","statusLastChangedAt":"2026-06-04T12:43:39.496Z","hidden":false}],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"PaintBench: Deterministic Evaluation of Precise Visual Editing","submittedOnDailyBy":{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","isPro":true,"fullname":"Ellis Brown","user":"ellisbrown","type":"user","name":"ellisbrown"},"summary":"While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations. Fine-grained benchmark diagnostics further show performance degradations induced by scene variations in object count, background complexity, color scheme, and edit-region size. To test generalization of PaintBench scores to applied task performance, we create a procedural, deterministic evaluation for data visualization editing (TinyGrafixBench) and find strong linear correlation with PaintBench scores (R^2 = 0.91, p < 0.001). Altogether, PaintBench provides a rigorous foundation for measuring and driving progress in precise multimodal visual editing.","upvotes":2,"discussionId":"6a1e41ea808ddbc3c7d43c25","projectPage":"https://paintbench.github.io","githubRepo":"https://github.com/PaintBench/PaintBench","githubRepoAddedBy":"user","ai_summary":"PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations.","ai_keywords":["multimodal models","visual editing","procedural generation","deterministic evaluation","task decomposition","fine-grained benchmark diagnostics","generalization","data visualization editing"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":2,"organization":{"_id":"662741612ada5b77e310d171","name":"nyu-visionx","fullname":"VISIONx @ NYU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/Kn-QtZjE6TJE-syTndXIW.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66bbcb6d9dc887ecfbb998b2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bbcb6d9dc887ecfbb998b2/NmAW10JQJbG_hEmYbKg3f.jpeg","isPro":false,"fullname":"Kai Xu","user":"kai-xu","type":"user"},{"_id":"626dc5105f7327906f0b2a4e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/QCSzuwYqsv8ozRnusVb-F.jpeg","isPro":true,"fullname":"Ellis Brown","user":"ellisbrown","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"662741612ada5b77e310d171","name":"nyu-visionx","fullname":"VISIONx @ NYU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/626dc5105f7327906f0b2a4e/Kn-QtZjE6TJE-syTndXIW.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.00188.md"}">
PaintBench: Deterministic Evaluation of Precise Visual Editing
Abstract
PaintBench presents a scalable benchmark for precise visual editing tasks, revealing low performance across models and identifying key challenges in geometric transformations and structural manipulations.
While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations. Fine-grained benchmark diagnostics further show performance degradations induced by scene variations in object count, background complexity, color scheme, and edit-region size. To test generalization of PaintBench scores to applied task performance, we create a procedural, deterministic evaluation for data visualization editing (TinyGrafixBench) and find strong linear correlation with PaintBench scores (R^2 = 0.91, p < 0.001). Altogether, PaintBench provides a rigorous foundation for measuring and driving progress in precise multimodal visual editing.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.00188 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.00188 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.