Hugging Face Daily Papers · June 1, 2026 · 6 min read

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline with multi-stage quality control and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks.</p>\n","updatedAt":"2026-06-01T02:33:27.331Z","author":{"_id":"69db8a673114e93a4dbdec28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/eJg3LOMGXpPKPa9lrzHtE.jpeg","fullname":"PolyU_VCLab","name":"VCLab-PolyU","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9161755442619324},"editors":["VCLab-PolyU"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/eJg3LOMGXpPKPa9lrzHtE.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.31039","authors":[{"_id":"6a1cd9a8808ddbc3c7d4339c","name":"Xiangtao Kong","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d4339d","name":"Jixin Zhao","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d4339e","name":"Lingchen Sun","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d4339f","name":"Rongyuan Wu","hidden":false},{"_id":"6a1cd9a8808ddbc3c7d433a0","name":"Lei Zhang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/69db8a673114e93a4dbdec28/gWdwia1mnl2HPwht1db6z.mp4"],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-01T00:00:00.000Z","title":"GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration","submittedOnDailyBy":{"_id":"69db8a673114e93a4dbdec28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/eJg3LOMGXpPKPa9lrzHtE.jpeg","isPro":true,"fullname":"PolyU_VCLab","user":"VCLab-PolyU","type":"user","name":"VCLab-PolyU"},"summary":"Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.","upvotes":24,"discussionId":"6a1cd9a9808ddbc3c7d433a1","projectPage":"https://polyu-vclab.github.io/GGT-100K/","githubRepo":"https://github.com/PolyU-VCLab/GGT-100K","githubRepoAddedBy":"user","ai_summary":"Generative multimodal foundation models are used to create high-quality training data for image restoration, improving model generalization across diverse real-world scenarios.","ai_keywords":["generative multimodal foundation models","image restoration","synthetic datasets","real-world degradations","generative ground truth","Nano-Banana-2","VLM-based adaptive prompting","LQ-HQ paired dataset","multi-stage quality control"],"githubStars":9,"organization":{"_id":"69dba0c8dc88214a5ddca3f2","name":"VCLab-HKPU","fullname":"VCLab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/WgIv-pv3Mt2HBSdoR35eW.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"668cde11781c9e63c551f662","avatarUrl":"/avatars/dd1f40dcb46af12ac4352ffa8aa3dd9b.svg","isPro":false,"fullname":"Xiangtao KONG","user":"KXTV587","type":"user"},{"_id":"65afa809fd71cbc31808a874","avatarUrl":"/avatars/5ff2248300e24f715aff044b374603c1.svg","isPro":false,"fullname":"Qs Joy","user":"Joypop","type":"user"},{"_id":"667296304939264251b158f7","avatarUrl":"/avatars/e16de2ff400470438e543b8246a5468a.svg","isPro":false,"fullname":"Yujing Sun","user":"yujingsun","type":"user"},{"_id":"66ff81731687036580bea355","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66ff81731687036580bea355/Wgxqf-HeE4D9mhZBu7vDr.jpeg","isPro":false,"fullname":"Wang","user":"ShihaoW","type":"user"},{"_id":"649b9d42b04c54a93758b741","avatarUrl":"/avatars/c575a7e7128f247e33739d3c5c7ed1f4.svg","isPro":true,"fullname":"CHEN Liyi","user":"mutou0308","type":"user"},{"_id":"61454a989cd783fec339bdd0","avatarUrl":"/avatars/39cc15c0a70e0d2b1f1ef1c7a98e7db8.svg","isPro":false,"fullname":"Xi Yang","user":"ianyeung","type":"user"},{"_id":"6548f3e8c76cf07a42297a55","avatarUrl":"/avatars/666cb52d88fc6307db743fb82e63a2da.svg","isPro":false,"fullname":"zhengqiang ZHANG","user":"xtudbxk","type":"user"},{"_id":"642f8ce583e266575cf2aa20","avatarUrl":"/avatars/2a8c7a9bc0bfdde80078e3bb821f25b5.svg","isPro":false,"fullname":"Jinrui Zhang","user":"zjr2000","type":"user"},{"_id":"684a84270fff63d29d19a747","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/684a84270fff63d29d19a747/ip7_0d0sO3RnKkfBzCzaG.jpeg","isPro":false,"fullname":"WenlongZhang","user":"WenlongZhang517","type":"user"},{"_id":"625d5b9f0bec31f086e04cd9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1650285458447-noauth.jpeg","isPro":false,"fullname":"YuandongPu","user":"Andrew613","type":"user"},{"_id":"6912a013e13260a8d6a287ae","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/kJ-iECWS4hy4OrZTYWN4j.png","isPro":false,"fullname":"Xiangyu Chen","user":"chxy95","type":"user"},{"_id":"655b1e1e4cd8d44865ff12ae","avatarUrl":"/avatars/4947320b8324feff55831f19cb1b4c7e.svg","isPro":false,"fullname":"GuoYanjun","user":"Guoyanjun","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69dba0c8dc88214a5ddca3f2","name":"VCLab-HKPU","fullname":"VCLab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/69db8a673114e93a4dbdec28/WgIv-pv3Mt2HBSdoR35eW.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.31039.md"}">

Papers

arxiv:2605.31039

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Published on May 29

· Submitted by

PolyU_VCLab on Jun 1

VCLab

Upvote

Authors:

Abstract

Generative multimodal foundation models are used to create high-quality training data for image restoration, improving model generalization across diverse real-world scenarios.

AI-generated summary