Hugging Face Daily Papers · · 3 min read

CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

We introduce CurveBench, a benchmark for testing whether vision-language models can recover hierarchical region-containment trees from images of non-intersecting Jordan curves. The task targets visual topology and structured reasoning beyond simple object recognition, counting, or OCR.</p>\n<p>The Hugging Face collection includes the paper, the CurveBench and CurveBench-Easy datasets, evaluation code, ground-truth generation resources, and fine-tuning artifacts. Our results show that even strong frontier VLMs struggle substantially on the harder settings, while fine-tuned open models improve but remain far from solving the task.</p>\n","updatedAt":"2026-05-15T07:49:03.470Z","author":{"_id":"65e1bdb336a669a4ca5dab7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65e1bdb336a669a4ca5dab7d/XzxXRFbs6-Sgd8RyoIjYV.png","fullname":"Amir Mohseni","name":"AmirMohseni","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":14,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8417820930480957},"editors":["AmirMohseni"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65e1bdb336a669a4ca5dab7d/XzxXRFbs6-Sgd8RyoIjYV.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14068","authors":[{"_id":"6a06c0b6b1a8cbabc9f09a84","name":"Amirreza Mohseni","hidden":false},{"_id":"6a06c0b6b1a8cbabc9f09a85","name":"Mona Mohammadi","hidden":false},{"_id":"6a06c0b6b1a8cbabc9f09a86","name":"Morteza Saghafian","hidden":false},{"_id":"6a06c0b6b1a8cbabc9f09a87","name":"Naser Talebizadeh Saradari","hidden":false}],"publishedAt":"2026-05-13T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves","submittedOnDailyBy":{"_id":"65e1bdb336a669a4ca5dab7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65e1bdb336a669a4ca5dab7d/XzxXRFbs6-Sgd8RyoIjYV.png","isPro":true,"fullname":"Amir Mohseni","user":"AmirMohseni","type":"user","name":"AmirMohseni"},"summary":"We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of 756 images of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only 71.1\\% tree-generation accuracy on CurveBench-Easy and 19.1\\% on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over Qwen-3-VL-8B-Thinking from 2.8\\% to 33.3\\% tree-generation accuracy on CurveBench-Easy, exceeding GPT-5.4 and Claude Opus 4.5 under our evaluation protocol. The remaining gap, especially on CurveBench-Hard, shows that exact topology-aware visual reasoning remains far from solved.","upvotes":0,"discussionId":"6a06c0b6b1a8cbabc9f09a88","projectPage":"https://huggingface.co/collections/AmirMohseni/curvebench","githubRepo":"https://github.com/Amir-Mohseni/CurveBench","githubRepoAddedBy":"user","ai_summary":"CurveBench presents a benchmark for hierarchical topological reasoning using visual inputs, demonstrating significant challenges in exact topology-aware visual reasoning even with advanced models.","ai_keywords":["hierarchical topological reasoning","structured prediction","rooted tree","containment relations","vision-language models","fine-tuning","visual reasoning"],"githubStars":0},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.14068.md"}">
Papers
arxiv:2605.14068

CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves

Published on May 13
· Submitted by
Amir Mohseni
on May 15
Authors:
,
,
,

Abstract

CurveBench presents a benchmark for hierarchical topological reasoning using visual inputs, demonstrating significant challenges in exact topology-aware visual reasoning even with advanced models.

AI-generated summary

We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of 756 images of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only 71.1\% tree-generation accuracy on CurveBench-Easy and 19.1\% on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over Qwen-3-VL-8B-Thinking from 2.8\% to 33.3\% tree-generation accuracy on CurveBench-Easy, exceeding GPT-5.4 and Claude Opus 4.5 under our evaluation protocol. The remaining gap, especially on CurveBench-Hard, shows that exact topology-aware visual reasoning remains far from solved.

Community

We introduce CurveBench, a benchmark for testing whether vision-language models can recover hierarchical region-containment trees from images of non-intersecting Jordan curves. The task targets visual topology and structured reasoning beyond simple object recognition, counting, or OCR.

The Hugging Face collection includes the paper, the CurveBench and CurveBench-Easy datasets, evaluation code, ground-truth generation resources, and fine-tuning artifacts. Our results show that even strong frontier VLMs struggle substantially on the harder settings, while fine-tuned open models improve but remain far from solving the task.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.14068
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.14068 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.14068 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers