Hugging Face Daily Papers · · 6 min read

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline with multi-stage quality control and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks.</p>\n","updatedAt":"2026-06-01T02:33:27.331Z","author":{"_id":"69db8a673114e93a4dbdec28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/eJg3LOMGXpPKPa9lrzHtE.jpeg","fullname":"PolyU_VCLab","name":"VCLab-PolyU","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9161755442619324},"editors":["VCLab-PolyU"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/eJg3LOMGXpPKPa9lrzHtE.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.31039","authors":[{"_id":"6a1cd9a8808ddbc3c7d4339c","name":"Xiangtao Kong","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d4339d","name":"Jixin Zhao","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d4339e","name":"Lingchen Sun","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d4339f","name":"Rongyuan Wu","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d433a0","name":"Lei Zhang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/69db8a673114e93a4dbdec28/gWdwia1mnl2HPwht1db6z.mp4"],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-01T00:00:00.000Z","title":"GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration","submittedOnDailyBy":{"_id":"69db8a673114e93a4dbdec28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/eJg3LOMGXpPKPa9lrzHtE.jpeg","isPro":true,"fullname":"PolyU_VCLab","user":"VCLab-PolyU","type":"user","name":"VCLab-PolyU"},"summary":"Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.","upvotes":24,"discussionId":"6a1cd9a9808ddbc3c7d433a1","projectPage":"https://polyu-vclab.github.io/GGT-100K/","githubRepo":"https://github.com/PolyU-VCLab/GGT-100K","githubRepoAddedBy":"user","ai_summary":"Generative multimodal foundation models are used to create high-quality training data for image restoration, improving model generalization across diverse real-world scenarios.","ai_keywords":["generative multimodal foundation models","image restoration","synthetic datasets","real-world degradations","generative ground truth","Nano-Banana-2","VLM-based adaptive prompting","LQ-HQ paired dataset","multi-stage quality control"],"githubStars":9,"organization":{"_id":"69dba0c8dc88214a5ddca3f2","name":"VCLab-HKPU","fullname":"VCLab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/WgIv-pv3Mt2HBSdoR35eW.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"668cde11781c9e63c551f662","avatarUrl":"/avatars/dd1f40dcb46af12ac4352ffa8aa3dd9b.svg","isPro":false,"fullname":"Xiangtao KONG","user":"KXTV587","type":"user"},{"_id":"65afa809fd71cbc31808a874","avatarUrl":"/avatars/5ff2248300e24f715aff044b374603c1.svg","isPro":false,"fullname":"Qs Joy","user":"Joypop","type":"user"},{"_id":"667296304939264251b158f7","avatarUrl":"/avatars/e16de2ff400470438e543b8246a5468a.svg","isPro":false,"fullname":"Yujing Sun","user":"yujingsun","type":"user"},{"_id":"66ff81731687036580bea355","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66ff81731687036580bea355/Wgxqf-HeE4D9mhZBu7vDr.jpeg","isPro":false,"fullname":"Wang","user":"ShihaoW","type":"user"},{"_id":"649b9d42b04c54a93758b741","avatarUrl":"/avatars/c575a7e7128f247e33739d3c5c7ed1f4.svg","isPro":true,"fullname":"CHEN Liyi","user":"mutou0308","type":"user"},{"_id":"61454a989cd783fec339bdd0","avatarUrl":"/avatars/39cc15c0a70e0d2b1f1ef1c7a98e7db8.svg","isPro":false,"fullname":"Xi Yang","user":"ianyeung","type":"user"},{"_id":"6548f3e8c76cf07a42297a55","avatarUrl":"/avatars/666cb52d88fc6307db743fb82e63a2da.svg","isPro":false,"fullname":"zhengqiang ZHANG","user":"xtudbxk","type":"user"},{"_id":"642f8ce583e266575cf2aa20","avatarUrl":"/avatars/2a8c7a9bc0bfdde80078e3bb821f25b5.svg","isPro":false,"fullname":"Jinrui Zhang","user":"zjr2000","type":"user"},{"_id":"684a84270fff63d29d19a747","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/684a84270fff63d29d19a747/ip7_0d0sO3RnKkfBzCzaG.jpeg","isPro":false,"fullname":"WenlongZhang","user":"WenlongZhang517","type":"user"},{"_id":"625d5b9f0bec31f086e04cd9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1650285458447-noauth.jpeg","isPro":false,"fullname":"YuandongPu","user":"Andrew613","type":"user"},{"_id":"6912a013e13260a8d6a287ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/kJ-iECWS4hy4OrZTYWN4j.png","isPro":false,"fullname":"Xiangyu Chen","user":"chxy95","type":"user"},{"_id":"655b1e1e4cd8d44865ff12ae","avatarUrl":"/avatars/4947320b8324feff55831f19cb1b4c7e.svg","isPro":false,"fullname":"GuoYanjun","user":"Guoyanjun","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69dba0c8dc88214a5ddca3f2","name":"VCLab-HKPU","fullname":"VCLab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/WgIv-pv3Mt2HBSdoR35eW.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.31039.md"}">
Papers
arxiv:2605.31039

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Published on May 29
· Submitted by
PolyU_VCLab
on Jun 1
Authors:
,
,
,
,

Abstract

Generative multimodal foundation models are used to create high-quality training data for image restoration, improving model generalization across diverse real-world scenarios.

AI-generated summary

Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.

Community

Paper submitter about 8 hours ago

Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline with multi-stage quality control and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.31039
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.31039 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.31039 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers