We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision by fine-tuning a pretrained layer decomposition model using only feedback from a vision-language model (VLM). Starting from Qwen-Image-Layered, we apply Flow-GRPO with LoRA adaptation, sampling multiple candidate decompositions per image, scoring them with a VLM, and optimising the policy from group-relative advantages.</p>\n<p>The key challenge lies in designing a reliable reward signal: VLMs scoring samples in isolation tend to compress their judgements into a narrow band, leaving GRPO with little within-group variance to learn from. We address this with a two-stage evaluation pipeline that pairs structured per-sample scoring across five edit-centric criteria with a grid-based calibration step in which the VLM re-scores all candidates side-by-side. Trained entirely on unlabelled images, Stable-Layers produces decompositions with stronger layer separation, fewer blank or artifact-heavy layers, and lower per-layer reconstruction error compared to the base model.</p>\n<p>Project Page: <a href=\"https://stability-ai.github.io/stable-layers.github.io/\" rel=\"nofollow\">https://stability-ai.github.io/stable-layers.github.io/</a></p>\n","updatedAt":"2026-06-04T14:32:23.997Z","author":{"_id":"63357214eb6132ca653020e7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63357214eb6132ca653020e7/A_imoyFDx30wgi0guG_s8.png","fullname":"Ciara","name":"CiaraRowles","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":125,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8652788400650024},"editors":["CiaraRowles"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63357214eb6132ca653020e7/A_imoyFDx30wgi0guG_s8.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.30257","authors":[{"_id":"6a202b6d15100c5272a84250","name":"Ciara Rowles","hidden":false},{"_id":"6a202b6d15100c5272a84251","name":"Reshinth Adithyan","hidden":false},{"_id":"6a202b6d15100c5272a84252","name":"Nikhil Pinnaparaju","hidden":false},{"_id":"6a202b6d15100c5272a84253","name":"Vikram Voleti","hidden":false},{"_id":"6a202b6d15100c5272a84254","name":"Mark Boss","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning","submittedOnDailyBy":{"_id":"63357214eb6132ca653020e7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63357214eb6132ca653020e7/A_imoyFDx30wgi0guG_s8.png","isPro":true,"fullname":"Ciara","user":"CiaraRowles","type":"user","name":"CiaraRowles"},"summary":"We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision by fine-tuning a pretrained layer decomposition model using only feedback from a vision-language model (VLM). Starting from Qwen-Image-Layered, we apply Flow-GRPO with LoRA adaptation, sampling multiple candidate decompositions per image, scoring them with a VLM, and optimising the policy from group-relative advantages. The key challenge lies in designing a reliable reward signal: VLMs scoring samples in isolation tend to compress their judgements into a narrow band, leaving GRPO with little within-group variance to learn from. We address this with a two-stage evaluation pipeline that pairs structured per-sample scoring across five edit-centric criteria with a grid-based calibration step in which the VLM re-scores all candidates side-by-side. Stable-Layers produces decompositions with stronger layer separation, fewer blank or artifact-heavy layers, and lower per-layer reconstruction error on the Crello dataset compared to the base model.","upvotes":3,"discussionId":"6a202b6d15100c5272a84255","projectPage":"https://stability-ai.github.io/stable-layers.github.io/","ai_summary":"Stable-Layers uses reinforcement learning with vision-language model feedback to improve layer decomposition without paired data, employing Flow-GRPO and LoRA adaptation for optimized policy training.","ai_keywords":["reinforcement learning","pretrained layer decomposition model","vision-language model","Flow-GRPO","LoRA adaptation","group-relative advantages","reward signal","VLM scoring","two-stage evaluation pipeline","edit-centric criteria","grid-based calibration"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"62e1573a6fb6e362b4a90690","name":"stabilityai","fullname":"Stability AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/643feeb67bc3fbde1385cc25/7vmYr2XwVcPtkLzac_jxQ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"},{"_id":"6351e5bb3734c6e8a5c1bec1","avatarUrl":"/avatars/a784a51b369b197398575c3afbd5ceab.svg","isPro":false,"fullname":"Han-Bit Kang","user":"hbkang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"62e1573a6fb6e362b4a90690","name":"stabilityai","fullname":"Stability AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/643feeb67bc3fbde1385cc25/7vmYr2XwVcPtkLzac_jxQ.png"}}">
Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning
Published on May 28
· Submitted by Ciara on Jun 4 Abstract
Stable-Layers uses reinforcement learning with vision-language model feedback to improve layer decomposition without paired data, employing Flow-GRPO and LoRA adaptation for optimized policy training.
We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision by fine-tuning a pretrained layer decomposition model using only feedback from a vision-language model (VLM). Starting from Qwen-Image-Layered, we apply Flow-GRPO with LoRA adaptation, sampling multiple candidate decompositions per image, scoring them with a VLM, and optimising the policy from group-relative advantages. The key challenge lies in designing a reliable reward signal: VLMs scoring samples in isolation tend to compress their judgements into a narrow band, leaving GRPO with little within-group variance to learn from. We address this with a two-stage evaluation pipeline that pairs structured per-sample scoring across five edit-centric criteria with a grid-based calibration step in which the VLM re-scores all candidates side-by-side. Stable-Layers produces decompositions with stronger layer separation, fewer blank or artifact-heavy layers, and lower per-layer reconstruction error on the Crello dataset compared to the base model.
Community
We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision by fine-tuning a pretrained layer decomposition model using only feedback from a vision-language model (VLM). Starting from Qwen-Image-Layered, we apply Flow-GRPO with LoRA adaptation, sampling multiple candidate decompositions per image, scoring them with a VLM, and optimising the policy from group-relative advantages.
The key challenge lies in designing a reliable reward signal: VLMs scoring samples in isolation tend to compress their judgements into a narrow band, leaving GRPO with little within-group variance to learn from. We address this with a two-stage evaluation pipeline that pairs structured per-sample scoring across five edit-centric criteria with a grid-based calibration step in which the VLM re-scores all candidates side-by-side. Trained entirely on unlabelled images, Stable-Layers produces decompositions with stronger layer separation, fewer blank or artifact-heavy layers, and lower per-layer reconstruction error compared to the base model.
Project Page: https://stability-ai.github.io/stable-layers.github.io/
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.30257 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.30257 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.30257 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.