\n\t<a id=\"m2retinexformer-multi-modal-retinexformer-for-low-light-image-enhancement\" class=\"block pr-1.5 text-lg md:absolute md:p-1.5 md:opacity-0 md:group-hover:opacity-100 md:right-full\" href=\"#m2retinexformer-multi-modal-retinexformer-for-low-light-image-enhancement\" rel=\"nofollow\">\n\t\t<span class=\"header-link\"><svg class=\"text-gray-500 hover:text-black dark:hover:text-gray-200 w-4\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" aria-hidden=\"true\" role=\"img\" width=\"1em\" height=\"1em\" preserveAspectRatio=\"xMidYMid meet\" viewBox=\"0 0 256 256\"><path d=\"M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z\" fill=\"currentColor\"></path></svg></span>\n\t</a>\n\t<span>\n\t\tM2Retinexformer: Multi-Modal Retinexformer for Low-Light Image Enhancement\n\t</span>\n</h1>\n<p>Low-light image enhancement is challenging due to complex degradations, including amplified noise, artifacts, and color distortion. While Retinex-based deep learning methods have achieved promising results, they primarily rely on single-modality RGB information.</p>\n<p>We propose <strong>M2Retinexformer</strong> (Multi-Modal Retinexformer), a novel framework that extends Retinexformer by incorporating <em>depth cues</em>, <em>luminance priors</em>, and <em>semantic features</em> within a progressive refinement pipeline.</p>\n<p>Depth provides geometric context that is invariant to lighting variations, while luminance and semantic features offer explicit guidance on brightness distribution and scene understanding.<br>Modalities are extracted at multiple scales and fused through <em>cross-attention</em>, with <em>adaptive gating</em> dynamically balancing illumination-guided self-attention and cross-attention based on the reliability of auxiliary cues. </p>\n<p>Evaluations on the LOL, SID, SMID, and SDSD benchmarks demonstrate overall improvements over Retinexformer and recent state-of-the-art methods.</p>\n","updatedAt":"2026-05-14T11:37:16.729Z","author":{"_id":"66369e50c9717b8d8db2a90f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2eeD7gU7Inq3OUWHc-bqJ.jpeg","fullname":"Youssef Aboelwafa","name":"YoussefAboelwafa","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.7892329096794128},"editors":["YoussefAboelwafa"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2eeD7gU7Inq3OUWHc-bqJ.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12556","authors":[{"_id":"6a058aa0b1a8cbabc9f08994","user":{"_id":"66369e50c9717b8d8db2a90f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2eeD7gU7Inq3OUWHc-bqJ.jpeg","isPro":false,"fullname":"Youssef Aboelwafa","user":"YoussefAboelwafa","type":"user","name":"YoussefAboelwafa"},"name":"Youssef Aboelwafa","status":"claimed_verified","statusLastChangedAt":"2026-05-14T10:54:13.685Z","hidden":false},{"_id":"6a058aa0b1a8cbabc9f08995","name":"Hicham G. Elmongui","hidden":false},{"_id":"6a058aa0b1a8cbabc9f08996","name":"Marwan Torki","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/66369e50c9717b8d8db2a90f/8yZjAGYOzYQEGZ45IjC9h.png"],"publishedAt":"2026-05-11T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"M2Retinexformer: Multi-Modal Retinexformer for Low-Light Image Enhancement","submittedOnDailyBy":{"_id":"66369e50c9717b8d8db2a90f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2eeD7gU7Inq3OUWHc-bqJ.jpeg","isPro":false,"fullname":"Youssef Aboelwafa","user":"YoussefAboelwafa","type":"user","name":"YoussefAboelwafa"},"summary":"Low-light image enhancement is challenging due to complex degradations, including amplified noise, artifacts, and color distortion. While Retinex-based deep learning methods have achieved promising results, they primarily rely on single-modality RGB information. We propose M2Retinexformer (Multi-Modal Retinexformer), a novel framework that extends Retinexformer by incorporating depth cues, luminance priors, and semantic features within a progressive refinement pipeline. Depth provides geometric context that is invariant to lighting variations, while luminance and semantic features offer explicit guidance on brightness distribution and scene understanding. Modalities are extracted at multiple scales and fused through cross-attention, with adaptive gating dynamically balancing illumination-guided self-attention and cross-attention based on the reliability of auxiliary cues. Evaluations on the LOL, SID, SMID, and SDSD benchmarks demonstrate overall improvements over Retinexformer and recent state-of-the-art methods. Code and pretrained weights are available at https://github.com/YoussefAboelwafa/M2Retinexformer","upvotes":1,"discussionId":"6a058aa0b1a8cbabc9f08997","githubRepo":"https://github.com/YoussefAboelwafa/M2Retinexformer","githubRepoAddedBy":"user","ai_summary":"A multi-modal deep learning framework enhances low-light images by integrating depth cues, luminance priors, and semantic features through cross-attention fusion and adaptive gating mechanisms.","ai_keywords":["Retinexformer","multi-modal","cross-attention","adaptive gating","progressive refinement pipeline","luminance priors","semantic features","depth cues"],"githubStars":6,"organization":{"_id":"669118ad28e860ffe9bf2d1a","name":"Alexandria-University","fullname":"Alexandria University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/669117b2cbcaf7ab0ed50556/fOwWveqIbfwbceCyuHB4B.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66369e50c9717b8d8db2a90f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2eeD7gU7Inq3OUWHc-bqJ.jpeg","isPro":false,"fullname":"Youssef Aboelwafa","user":"YoussefAboelwafa","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"669118ad28e860ffe9bf2d1a","name":"Alexandria-University","fullname":"Alexandria University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/669117b2cbcaf7ab0ed50556/fOwWveqIbfwbceCyuHB4B.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12556.md"}">
M2Retinexformer: Multi-Modal Retinexformer for Low-Light Image Enhancement
Abstract
A multi-modal deep learning framework enhances low-light images by integrating depth cues, luminance priors, and semantic features through cross-attention fusion and adaptive gating mechanisms.
AI-generated summary
Low-light image enhancement is challenging due to complex degradations, including amplified noise, artifacts, and color distortion. While Retinex-based deep learning methods have achieved promising results, they primarily rely on single-modality RGB information. We propose M2Retinexformer (Multi-Modal Retinexformer), a novel framework that extends Retinexformer by incorporating depth cues, luminance priors, and semantic features within a progressive refinement pipeline. Depth provides geometric context that is invariant to lighting variations, while luminance and semantic features offer explicit guidance on brightness distribution and scene understanding. Modalities are extracted at multiple scales and fused through cross-attention, with adaptive gating dynamically balancing illumination-guided self-attention and cross-attention based on the reliability of auxiliary cues. Evaluations on the LOL, SID, SMID, and SDSD benchmarks demonstrate overall improvements over Retinexformer and recent state-of-the-art methods. Code and pretrained weights are available at https://github.com/YoussefAboelwafa/M2Retinexformer
Community
M2Retinexformer: Multi-Modal Retinexformer for Low-Light Image Enhancement
Low-light image enhancement is challenging due to complex degradations, including amplified noise, artifacts, and color distortion. While Retinex-based deep learning methods have achieved promising results, they primarily rely on single-modality RGB information.
We propose M2Retinexformer (Multi-Modal Retinexformer), a novel framework that extends Retinexformer by incorporating depth cues, luminance priors, and semantic features within a progressive refinement pipeline.
Depth provides geometric context that is invariant to lighting variations, while luminance and semantic features offer explicit guidance on brightness distribution and scene understanding.
Modalities are extracted at multiple scales and fused through cross-attention, with adaptive gating dynamically balancing illumination-guided self-attention and cross-attention based on the reliability of auxiliary cues.
Evaluations on the LOL, SID, SMID, and SDSD benchmarks demonstrate overall improvements over Retinexformer and recent state-of-the-art methods.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.12556 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.12556 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.