Hugging Face Daily Papers · · 4 min read

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

We introduce SpaceDG, the first large-scale dataset for degradation-aware spatial intelligence, and SpaceDG-Bench, a human-verified benchmark for evaluating MLLMs under visual degradations 🔥</p>\n","updatedAt":"2026-05-22T04:42:40.655Z","author":{"_id":"6938f4de790b5cd0f6df6462","avatarUrl":"/avatars/4f22f0499d96bb749af7e8dba2b0b533.svg","fullname":"Zhihang Zhong","name":"Zuica96","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6886186599731445},"editors":["Zuica96"],"editorAvatarUrls":["/avatars/4f22f0499d96bb749af7e8dba2b0b533.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.22536","authors":[{"_id":"6a0fdeb4a53a61ce2e422d9c","name":"Xiaolong Zhou","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422d9d","name":"Yifei Liu","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422d9e","name":"Ziyang Gong","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422d9f","name":"Jiarui Li","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422da0","name":"Qiyue Zhao","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422da1","name":"Muyao Niu","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422da2","name":"Yuanyuan Gao","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422da3","name":"Le Ma","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422da4","name":"Xue Yang","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422da5","name":"Hongjie Zhang","hidden":false},{"_id":"6a0fdeb4a53a61ce2e422da6","name":"Zhihang Zhong","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6938f4de790b5cd0f6df6462/V0VXiEkH--EiypyVHItbM.mp4"],"publishedAt":"2026-05-21T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation","submittedOnDailyBy":{"_id":"6938f4de790b5cd0f6df6462","avatarUrl":"/avatars/4f22f0499d96bb749af7e8dba2b0b533.svg","isPro":false,"fullname":"Zhihang Zhong","user":"Zuica96","type":"user","name":"Zuica96"},"summary":"Multimodal Large Language Models (MLLMs) have made rapid progress in spatial intelligence, yet existing spatial reasoning benchmarks largely assume pristine visual inputs and overlook the degradations that commonly occur in real-world deployment, such as motion blur, low light, adverse weather, lens distortion, and compression artifacts. This raises a fundamental question: how robust is the spatial intelligence of current MLLMs when visual observations are imperfect? To answer this question, we introduce SpaceDG, the first large-scale dataset for degradation-aware spatial understanding. It is constructed with a physically grounded degradation synthesis engine that embeds degradation formation process into 3D Gaussian Splatting (3DGS) rendering, enabling realistic simulation of nine degradation types. The resulting dataset contains approximately 1M QA pairs from nearly 1,000 indoor scenes. We further introduce SpaceDG-Bench, an human-verified benchmark with 1,102 questions spanning 11 reasoning categories and 9 visual degradation types, yielding over 10K VQA instances. Evaluating 25 open- and closed-source MLLMs reveals that visual degradations consistently and substantially impair spatial reasoning, exposing a critical robustness gap. Finally, we show that finetuning on SpaceDG markedly improves degradation robustness and can even surpass human performance under degraded conditions without any performance drop on clean images, highlighting the promise of degradation-aware training for robust spatial intelligence.","upvotes":18,"discussionId":"6a0fdeb5a53a61ce2e422da7","projectPage":"https://visionary-laboratory.github.io/SpaceDG/","githubRepo":"https://github.com/Visionary-Laboratory/SpaceDG","githubRepoAddedBy":"user","ai_summary":"SpaceDG dataset and benchmark evaluate multimodal language models' spatial reasoning robustness under visual degradations, revealing significant performance gaps and demonstrating improved robustness through targeted training.","ai_keywords":["Multimodal Large Language Models","spatial intelligence","degradation-aware spatial understanding","3D Gaussian Splatting","visual degradation","VQA instances","human-verified benchmark","finetuning"],"githubStars":15,"organization":{"_id":"6938f59934ae2fe5939d023c","name":"Visionary-Laboratoary","fullname":"Visionary-Laboratoary","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6938f4de790b5cd0f6df6462/e5oOSNUpzMTOQislDkn9n.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69ccc0b4e84107cf2282c50c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/fCDn47kchI7pnxib6Iv-5.jpeg","isPro":false,"fullname":"山崎健太","user":"julianrobinson9","type":"user"},{"_id":"6938f4de790b5cd0f6df6462","avatarUrl":"/avatars/4f22f0499d96bb749af7e8dba2b0b533.svg","isPro":false,"fullname":"Zhihang Zhong","user":"Zuica96","type":"user"},{"_id":"649cf4ecdd87dd9ef76fe020","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/M7RpD_AcNewA2xADhhyCB.jpeg","isPro":false,"fullname":"Xuehui Wang","user":"huiserwang","type":"user"},{"_id":"66324e48cd63149d1e11b1ad","avatarUrl":"/avatars/f40aba47c795e958a11064923e70cf9f.svg","isPro":false,"fullname":"Xueying Li","user":"Leexy0311","type":"user"},{"_id":"65027130adf89caf5f06482f","avatarUrl":"/avatars/40d1ec4da143e67c3f23036bb3f81d1e.svg","isPro":false,"fullname":"zhanghongjie","user":"hodgeszhang","type":"user"},{"_id":"6624ba6d79d897d7ddee24b5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6624ba6d79d897d7ddee24b5/eHkAquXvHBlCNNgaRDYgG.jpeg","isPro":false,"fullname":"Guanzhou Chen","user":"Rayment","type":"user"},{"_id":"66000b5c41d8fee2c410222a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66000b5c41d8fee2c410222a/9NnY1xu0J1RtkvKX8PLAd.png","isPro":false,"fullname":"Jinhui Yin","user":"huitailang1","type":"user"},{"_id":"686631b464da2306a623f273","avatarUrl":"/avatars/b7fa5f326c530f0737ac2914c2356c34.svg","isPro":false,"fullname":"Xiaolin Liu","user":"CZKQH","type":"user"},{"_id":"660691330be1fbe3b9e4c33d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/660691330be1fbe3b9e4c33d/TxrDFH_cRu3AlpMC3xmhv.jpeg","isPro":false,"fullname":"ZiYang Gong","user":"Cusyoung","type":"user"},{"_id":"6579b818563044badca392fc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6579b818563044badca392fc/XTKQ9Lhceibp9dnQADPQF.jpeg","isPro":false,"fullname":"cuierfei","user":"cuierfei","type":"user"},{"_id":"6938462a6ba1380e222f6af1","avatarUrl":"/avatars/3eb78b3161b69855460e780ee80ea0ea.svg","isPro":false,"fullname":"morgan","user":"mniml","type":"user"},{"_id":"656d9263efd0eea7c5db109a","avatarUrl":"/avatars/259eed59a07464191ccb064170ecde87.svg","isPro":false,"fullname":"Qihao Yang","user":"CharlesYang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6938f59934ae2fe5939d023c","name":"Visionary-Laboratoary","fullname":"Visionary-Laboratoary","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6938f4de790b5cd0f6df6462/e5oOSNUpzMTOQislDkn9n.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.22536.md"}">
Papers
arxiv:2605.22536

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Published on May 21
· Submitted by
Zhihang Zhong
on May 22
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

SpaceDG dataset and benchmark evaluate multimodal language models' spatial reasoning robustness under visual degradations, revealing significant performance gaps and demonstrating improved robustness through targeted training.

AI-generated summary

Multimodal Large Language Models (MLLMs) have made rapid progress in spatial intelligence, yet existing spatial reasoning benchmarks largely assume pristine visual inputs and overlook the degradations that commonly occur in real-world deployment, such as motion blur, low light, adverse weather, lens distortion, and compression artifacts. This raises a fundamental question: how robust is the spatial intelligence of current MLLMs when visual observations are imperfect? To answer this question, we introduce SpaceDG, the first large-scale dataset for degradation-aware spatial understanding. It is constructed with a physically grounded degradation synthesis engine that embeds degradation formation process into 3D Gaussian Splatting (3DGS) rendering, enabling realistic simulation of nine degradation types. The resulting dataset contains approximately 1M QA pairs from nearly 1,000 indoor scenes. We further introduce SpaceDG-Bench, an human-verified benchmark with 1,102 questions spanning 11 reasoning categories and 9 visual degradation types, yielding over 10K VQA instances. Evaluating 25 open- and closed-source MLLMs reveals that visual degradations consistently and substantially impair spatial reasoning, exposing a critical robustness gap. Finally, we show that finetuning on SpaceDG markedly improves degradation robustness and can even surpass human performance under degraded conditions without any performance drop on clean images, highlighting the promise of degradation-aware training for robust spatial intelligence.

Community

Paper submitter about 7 hours ago

We introduce SpaceDG, the first large-scale dataset for degradation-aware spatial intelligence, and SpaceDG-Bench, a human-verified benchmark for evaluating MLLMs under visual degradations 🔥

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.22536
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.22536 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.22536 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.22536 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers