Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline with multi-stage quality control and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks.</p>\n","updatedAt":"2026-06-01T02:33:27.331Z","author":{"_id":"69db8a673114e93a4dbdec28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/eJg3LOMGXpPKPa9lrzHtE.jpeg","fullname":"PolyU_VCLab","name":"VCLab-PolyU","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9161755442619324},"editors":["VCLab-PolyU"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/eJg3LOMGXpPKPa9lrzHtE.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.31039","authors":[{"_id":"6a1cd9a8808ddbc3c7d4339c","name":"Xiangtao Kong","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d4339d","name":"Jixin Zhao","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d4339e","name":"Lingchen Sun","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d4339f","name":"Rongyuan Wu","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d433a0","name":"Lei Zhang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/69db8a673114e93a4dbdec28/gWdwia1mnl2HPwht1db6z.mp4"],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-01T00:00:00.000Z","title":"GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration","submittedOnDailyBy":{"_id":"69db8a673114e93a4dbdec28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/eJg3LOMGXpPKPa9lrzHtE.jpeg","isPro":true,"fullname":"PolyU_VCLab","user":"VCLab-PolyU","type":"user","name":"VCLab-PolyU"},"summary":"Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.","upvotes":24,"discussionId":"6a1cd9a9808ddbc3c7d433a1","projectPage":"https://polyu-vclab.github.io/GGT-100K/","githubRepo":"https://github.com/PolyU-VCLab/GGT-100K","githubRepoAddedBy":"user","ai_summary":"Generative multimodal foundation models are used to create high-quality training data for image restoration, improving model generalization across diverse real-world scenarios.","ai_keywords":["generative multimodal foundation models","image restoration","synthetic datasets","real-world degradations","generative ground truth","Nano-Banana-2","VLM-based adaptive prompting","LQ-HQ paired dataset","multi-stage quality control"],"githubStars":9,"organization":{"_id":"69dba0c8dc88214a5ddca3f2","name":"VCLab-HKPU","fullname":"VCLab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/WgIv-pv3Mt2HBSdoR35eW.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"668cde11781c9e63c551f662","avatarUrl":"/avatars/dd1f40dcb46af12ac4352ffa8aa3dd9b.svg","isPro":false,"fullname":"Xiangtao KONG","user":"KXTV587","type":"user"},{"_id":"65afa809fd71cbc31808a874","avatarUrl":"/avatars/5ff2248300e24f715aff044b374603c1.svg","isPro":false,"fullname":"Qs Joy","user":"Joypop","type":"user"},{"_id":"667296304939264251b158f7","avatarUrl":"/avatars/e16de2ff400470438e543b8246a5468a.svg","isPro":false,"fullname":"Yujing Sun","user":"yujingsun","type":"user"},{"_id":"66ff81731687036580bea355","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66ff81731687036580bea355/Wgxqf-HeE4D9mhZBu7vDr.jpeg","isPro":false,"fullname":"Wang","user":"ShihaoW","type":"user"},{"_id":"649b9d42b04c54a93758b741","avatarUrl":"/avatars/c575a7e7128f247e33739d3c5c7ed1f4.svg","isPro":true,"fullname":"CHEN Liyi","user":"mutou0308","type":"user"},{"_id":"61454a989cd783fec339bdd0","avatarUrl":"/avatars/39cc15c0a70e0d2b1f1ef1c7a98e7db8.svg","isPro":false,"fullname":"Xi Yang","user":"ianyeung","type":"user"},{"_id":"6548f3e8c76cf07a42297a55","avatarUrl":"/avatars/666cb52d88fc6307db743fb82e63a2da.svg","isPro":false,"fullname":"zhengqiang ZHANG","user":"xtudbxk","type":"user"},{"_id":"642f8ce583e266575cf2aa20","avatarUrl":"/avatars/2a8c7a9bc0bfdde80078e3bb821f25b5.svg","isPro":false,"fullname":"Jinrui Zhang","user":"zjr2000","type":"user"},{"_id":"684a84270fff63d29d19a747","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/684a84270fff63d29d19a747/ip7_0d0sO3RnKkfBzCzaG.jpeg","isPro":false,"fullname":"WenlongZhang","user":"WenlongZhang517","type":"user"},{"_id":"625d5b9f0bec31f086e04cd9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1650285458447-noauth.jpeg","isPro":false,"fullname":"YuandongPu","user":"Andrew613","type":"user"},{"_id":"6912a013e13260a8d6a287ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/kJ-iECWS4hy4OrZTYWN4j.png","isPro":false,"fullname":"Xiangyu Chen","user":"chxy95","type":"user"},{"_id":"655b1e1e4cd8d44865ff12ae","avatarUrl":"/avatars/4947320b8324feff55831f19cb1b4c7e.svg","isPro":false,"fullname":"GuoYanjun","user":"Guoyanjun","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69dba0c8dc88214a5ddca3f2","name":"VCLab-HKPU","fullname":"VCLab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/WgIv-pv3Mt2HBSdoR35eW.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.31039.md"}">
GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration
Abstract
Generative multimodal foundation models are used to create high-quality training data for image restoration, improving model generalization across diverse real-world scenarios.
AI-generated summary
Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.
Community
Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline with multi-stage quality control and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.31039 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.31039 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.