Hugging Face Daily Papers · May 20, 2026 · 4 min read

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

A novel Reinforcement Learning for continuous learning</p>\n","updatedAt":"2026-05-20T09:39:02.952Z","author":{"_id":"666bb205926c2e7e7837f2c6","avatarUrl":"/avatars/6bd696cbd1a9abf6a0d83f0e907da697.svg","fullname":"HanzhongGuo","name":"Alllann","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9611678719520569},"editors":["Alllann"],"editorAvatarUrls":["/avatars/6bd696cbd1a9abf6a0d83f0e907da697.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.09640","authors":[{"_id":"6a0b0dc73049bece374a8658","user":{"_id":"68593c397abe660b0bdc4a3e","avatarUrl":"/avatars/2d0811602cce6bf050421cc13e978ffd.svg","isPro":false,"fullname":"Meng Lou","user":"LMMM2025","type":"user","name":"LMMM2025"},"name":"Meng Lou","status":"claimed_verified","statusLastChangedAt":"2026-05-19T08:33:09.638Z","hidden":false},{"_id":"6a0b0dc73049bece374a8659","name":"Hanzhong Guo","hidden":false},{"_id":"6a0b0dc73049bece374a865a","name":"Linwei Chen","hidden":false},{"_id":"6a0b0dc73049bece374a865b","name":"Yizhou Yu","hidden":false}],"publishedAt":"2026-05-10T00:00:00.000Z","submittedOnDailyAt":"2026-05-20T00:00:00.000Z","title":"Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning","submittedOnDailyBy":{"_id":"666bb205926c2e7e7837f2c6","avatarUrl":"/avatars/6bd696cbd1a9abf6a0d83f0e907da697.svg","isPro":false,"fullname":"HanzhongGuo","user":"Alllann","type":"user","name":"Alllann"},"summary":"Recent studies suggest that Reinforcement Fine-Tuning (RFT) is inherently more resilient to catastrophic forgetting than Supervised Fine-Tuning (SFT). However, whether RFT (e.g., GRPO) can effectively overcome forgetting in challenging visual continual learning settings, such as class-incremental learning (CIL) and domain-incremental learning (DIL), remains an open problem. Through a pilot study, we confirm that while RFT consistently outperforms SFT, it still suffers from non-negligible forgetting. We empirically trace this bottleneck to Trajectory-level Drift Agnosticism: among candidate rollouts achieving identical task rewards, the KL divergence from the preceding-task policy varies substantially, which strongly correlates with catastrophic forgetting across sequential tasks. Motivated by this insight, we propose Retention-aware Policy Optimization (RaPO), a simple yet effective RFT method that explicitly mitigates forgetting through trajectory-level reward shaping. Specifically, RaPO comprises two core components: (1) Retention Reward that converts trajectory-level distribution drift into a continuous reward signal, preferentially reinforcing knowledge-preserving rollouts within each group; (2) Cross-Task Advantage Normalization (CTAN), which maintains a persistent exponential moving average of reward statistics across task boundaries to stabilize the optimization progress during continual learning. Leveraging the free-form textual generalization of MLLMs, we comprehensively evaluate RaPO across five visual continual learning settings. Extensive experiments demonstrate that RaPO achieves leading performance, substantially reducing catastrophic forgetting while preserving strong plasticity. To the best of our knowledge, this work represents the first systematic exploration of RFT in visual continual learning, offering insights that we hope will inspire future research.","upvotes":5,"discussionId":"6a0b0dc73049bece374a865c","ai_summary":"Reinforcement Fine-Tuning suffers from catastrophic forgetting in visual continual learning, which is addressed through Retention-aware Policy Optimization that uses trajectory-level reward shaping and cross-task advantage normalization.","ai_keywords":["Reinforcement Fine-Tuning","Supervised Fine-Tuning","catastrophic forgetting","class-incremental learning","domain-incremental learning","GRPO","Trajectory-level Drift Agnosticism","Retention-aware Policy Optimization","trajectory-level reward shaping","Cross-Task Advantage Normalization","MLLMs"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"666bb205926c2e7e7837f2c6","avatarUrl":"/avatars/6bd696cbd1a9abf6a0d83f0e907da697.svg","isPro":false,"fullname":"HanzhongGuo","user":"Alllann","type":"user"},{"_id":"68593c397abe660b0bdc4a3e","avatarUrl":"/avatars/2d0811602cce6bf050421cc13e978ffd.svg","isPro":false,"fullname":"Meng Lou","user":"LMMM2025","type":"user"},{"_id":"650fb55b81204fcae409ed2f","avatarUrl":"/avatars/4010dc2b2a13d3f8a94d489fb3e29bdd.svg","isPro":false,"fullname":"shuai","user":"HandsomeWu666","type":"user"},{"_id":"64cf5a1e5de9e1e9118ccd90","avatarUrl":"/avatars/cc497aa925a3a73c68cd1afba6423ad9.svg","isPro":false,"fullname":"Yunxiang Fu","user":"YunxiangFu1","type":"user"},{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":true,"fullname":"Urro","user":"urroxyz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.09640.md"}">

Papers

arxiv:2605.09640

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

Published on May 10

· Submitted by

HanzhongGuo on May 20

Upvote

Authors:

Meng Lou ,

Abstract

Reinforcement Fine-Tuning suffers from catastrophic forgetting in visual continual learning, which is addressed through Retention-aware Policy Optimization that uses trajectory-level reward shaping and cross-task advantage normalization.

AI-generated summary

Recent studies suggest that Reinforcement Fine-Tuning (RFT) is inherently more resilient to catastrophic forgetting than Supervised Fine-Tuning (SFT). However, whether RFT (e.g., GRPO) can effectively overcome forgetting in challenging visual continual learning settings, such as class-incremental learning (CIL) and domain-incremental learning (DIL), remains an open problem. Through a pilot study, we confirm that while RFT consistently outperforms SFT, it still suffers from non-negligible forgetting. We empirically trace this bottleneck to Trajectory-level Drift Agnosticism: among candidate rollouts achieving identical task rewards, the KL divergence from the preceding-task policy varies substantially, which strongly correlates with catastrophic forgetting across sequential tasks. Motivated by this insight, we propose Retention-aware Policy Optimization (RaPO), a simple yet effective RFT method that explicitly mitigates forgetting through trajectory-level reward shaping. Specifically, RaPO comprises two core components: (1) Retention Reward that converts trajectory-level distribution drift into a continuous reward signal, preferentially reinforcing knowledge-preserving rollouts within each group; (2) Cross-Task Advantage Normalization (CTAN), which maintains a persistent exponential moving average of reward statistics across task boundaries to stabilize the optimization progress during continual learning. Leveraging the free-form textual generalization of MLLMs, we comprehensively evaluate RaPO across five visual continual learning settings. Extensive experiments demonstrate that RaPO achieves leading performance, substantially reducing catastrophic forgetting while preserving strong plasticity. To the best of our knowledge, this work represents the first systematic exploration of RFT in visual continual learning, offering insights that we hope will inspire future research.

View arXiv page View PDF Add to collection

Community

Alllann

Paper submitter about 3 hours ago

A novel Reinforcement Learning for continuous learning

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.09640

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.09640 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09640 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09640 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers