Code: <a href=\"https://github.com/Fabian-Mor/sae-ft\" rel=\"nofollow\">https://github.com/Fabian-Mor/sae-ft</a></p>\n","updatedAt":"2026-05-18T11:56:06.711Z","author":{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","fullname":"Arnas Uselis","name":"Gigglingface","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8523325324058533},"editors":["Gigglingface"],"editorAvatarUrls":["/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.15961","authors":[{"_id":"6a0afe363049bece374a8636","name":"Fabian Morelli","hidden":false},{"_id":"6a0afe363049bece374a8637","name":"Arnas Uselis","hidden":false},{"_id":"6a0afe363049bece374a8638","name":"Ankit Sonthalia","hidden":false},{"_id":"6a0afe363049bece374a8639","name":"Seong Joon Oh","hidden":false}],"publishedAt":"2026-05-15T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models","submittedOnDailyBy":{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","isPro":false,"fullname":"Arnas Uselis","user":"Gigglingface","type":"user","name":"Gigglingface"},"summary":"Large-scale pre-trained vision-language models like CLIP demonstrate remarkable zero-shot performance across diverse tasks. However, fine-tuning these models to improve downstream performance often degrades robustness against distribution shifts. Recent approaches have attempted to mitigate this trade-off, but often rely on computationally expensive text-guidance. We propose a novel method for robust fine-tuning, SAE-FT, which operates only on the model's visual representations. SAE-FT regularizes changes to these representations by penalizing the addition and removal of semantically meaningful features identified by a Sparse Autoencoder trained on the pre-trained model. This constraint prevents catastrophic forgetting and makes the fine-tuning process interpretable, enabling direct analysis of semantic changes. SAE-FT is both mechanistically transparent and computationally efficient, matching or exceeding state-of-the-art performance on ImageNet and its associated distribution shift benchmarks. Code is publicly available at: https://github.com/Fabian-Mor/sae-ft.","upvotes":5,"discussionId":"6a0afe363049bece374a863a","githubRepo":"https://github.com/Fabian-Mor/sae-ft","githubRepoAddedBy":"user","ai_summary":"SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts.","ai_keywords":["vision-language models","CLIP","fine-tuning","distribution shifts","Sparse Autoencoder","visual representations","catastrophic forgetting","semantic features","mechanistic transparency","computational efficiency"],"githubStars":1},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","isPro":false,"fullname":"Arnas Uselis","user":"Gigglingface","type":"user"},{"_id":"67261fce31376d3bace1b77f","avatarUrl":"/avatars/dcdc5774520d349838887a710b6d61be.svg","isPro":false,"fullname":"Fabian Morelli","user":"fabianmorelli","type":"user"},{"_id":"6996f263368ec9060e706bc2","avatarUrl":"/avatars/77298387834c5b5ba61feb298f79f7bb.svg","isPro":false,"fullname":"Otysrlzupx5d","user":"otysrlzupx5d","type":"user"},{"_id":"64fdf8f931a82e0d40769869","avatarUrl":"/avatars/293c936282e9f59b67928ad9957ccdd9.svg","isPro":false,"fullname":"Ankit","user":"aktsonthalia2","type":"user"},{"_id":"658154bf1d82e2fdf5e45791","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/658154bf1d82e2fdf5e45791/6X9BeMavBcRvmB3LDus6Y.jpeg","isPro":false,"fullname":"JUAN","user":"AlexandreOfficiel","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.15961.md"}">
Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models
Abstract
SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts.
AI-generated summary
Large-scale pre-trained vision-language models like CLIP demonstrate remarkable zero-shot performance across diverse tasks. However, fine-tuning these models to improve downstream performance often degrades robustness against distribution shifts. Recent approaches have attempted to mitigate this trade-off, but often rely on computationally expensive text-guidance. We propose a novel method for robust fine-tuning, SAE-FT, which operates only on the model's visual representations. SAE-FT regularizes changes to these representations by penalizing the addition and removal of semantically meaningful features identified by a Sparse Autoencoder trained on the pre-trained model. This constraint prevents catastrophic forgetting and makes the fine-tuning process interpretable, enabling direct analysis of semantic changes. SAE-FT is both mechanistically transparent and computationally efficient, matching or exceeding state-of-the-art performance on ImageNet and its associated distribution shift benchmarks. Code is publicly available at: https://github.com/Fabian-Mor/sae-ft.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.15961 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.15961 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.15961 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.