ABACUS is a unified vision-language model that handles object counting, crowd counting, referring-expression counting, and count-faithful image generation without any benchmark-specific training required. Our model is built on existing 3B-parameter unified foundation model and is adapted for object localization tasks using three key innovations: density-aware adaptive zooming with objectness maps for spatial grounding; a boundary-aware count policy via GRPO to eliminate crop-boundary errors; and a cycle-consistent GRPO strategy where the understanding branch self-critiques generated outputs, closing the understanding-generation gap without any external annotations. ABACUS achieves state-of-the-art results across seven benchmarks, outperforming both task-specific specialists and larger generalist models.</p>\n<p><a href=\"https://cdn-uploads.huggingface.co/production/uploads/6399ab3296ce14c5dcf4ccbf/_NnWXW3Mttnh7SBh4V1Tv.jpeg\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/6399ab3296ce14c5dcf4ccbf/_NnWXW3Mttnh7SBh4V1Tv.jpeg\" alt=\"teaser\"></a></p>\n","updatedAt":"2026-06-27T00:53:59.153Z","author":{"_id":"6399ab3296ce14c5dcf4ccbf","avatarUrl":"/avatars/89aeedc96f73d76f7a6da96454f40fd2.svg","fullname":"Sauradip Nag","name":"sauradip","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8197715282440186},"editors":["sauradip"],"editorAvatarUrls":["/avatars/89aeedc96f73d76f7a6da96454f40fd2.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.23835","authors":[{"_id":"6a3f1e1c0dbbc53604b665a1","name":"Anindya Mondal","hidden":false},{"_id":"6a3f1e1c0dbbc53604b665a2","name":"Sauradip Nag","hidden":false},{"_id":"6a3f1e1c0dbbc53604b665a3","name":"Anjan Dutta","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6399ab3296ce14c5dcf4ccbf/3Z_1ADFMZKStNn-vLHFFY.jpeg","https://cdn-uploads.huggingface.co/production/uploads/6399ab3296ce14c5dcf4ccbf/jpu0vvVXx1z8bTNEuP7d2.jpeg"],"publishedAt":"2026-06-22T00:00:00.000Z","submittedOnDailyAt":"2026-06-26T00:00:00.000Z","title":"ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation","submittedOnDailyBy":{"_id":"6399ab3296ce14c5dcf4ccbf","avatarUrl":"/avatars/89aeedc96f73d76f7a6da96454f40fd2.svg","isPro":true,"fullname":"Sauradip Nag","user":"sauradip","type":"user","name":"sauradip"},"summary":"ABACUS is a unified vision-language model that handles object counting, crowd counting, referring-expression counting, and count-faithful image generation without any benchmark-specific training required. Our model is built on existing 3B-parameter unified foundation model and is adapted for object localization tasks using three key innovations: density-aware adaptive zooming with objectness maps for spatial grounding; a boundary-aware count policy via GRPO to eliminate crop-boundary errors; and a cycle-consistent GRPO strategy where the understanding branch self-critiques generated outputs, closing the understanding-generation gap without any external annotations. ABACUS achieves state-of-the-art results across seven benchmarks, outperforming both task-specific specialists and larger generalist models.","upvotes":3,"discussionId":"6a3f1e1c0dbbc53604b665a4","projectPage":"https://mondalanindya.github.io/ABACUS/","ai_summary":"ABACUS is a unified vision-language model that performs object counting and related tasks through innovative spatial grounding, boundary-aware counting policies, and self-critical learning strategies.","ai_keywords":["vision-language model","object counting","crowd counting","referring-expression counting","count-faithful image generation","unified foundation model","density-aware adaptive zooming","objectness maps","spatial grounding","boundary-aware count policy","GRPO","cycle-consistent learning","self-critique"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6399ab3296ce14c5dcf4ccbf","avatarUrl":"/avatars/89aeedc96f73d76f7a6da96454f40fd2.svg","isPro":true,"fullname":"Sauradip Nag","user":"sauradip","type":"user"},{"_id":"696da0962b3e2d9587d0b35d","avatarUrl":"/avatars/4f6c177ad51fb687ca1be75d18f6f5d6.svg","isPro":false,"fullname":"mini","user":"mini0999","type":"user"},{"_id":"61436ce71fbcc4d3c420210c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61436ce71fbcc4d3c420210c/hayKZdUbt5ltOtRMo9gZb.jpeg","isPro":false,"fullname":"Anindya Mondal","user":"anindyamondal","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.23835.md","query":{}}">
ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation
Community
ABACUS is a unified vision-language model that handles object counting, crowd counting, referring-expression counting, and count-faithful image generation without any benchmark-specific training required. Our model is built on existing 3B-parameter unified foundation model and is adapted for object localization tasks using three key innovations: density-aware adaptive zooming with objectness maps for spatial grounding; a boundary-aware count policy via GRPO to eliminate crop-boundary errors; and a cycle-consistent GRPO strategy where the understanding branch self-critiques generated outputs, closing the understanding-generation gap without any external annotations. ABACUS achieves state-of-the-art results across seven benchmarks, outperforming both task-specific specialists and larger generalist models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.23835 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.23835 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.23835 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.