Hugging Face Daily Papers · May 18, 2026 · 3 min read

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Code: <a href=\"https://github.com/Fabian-Mor/sae-ft\" rel=\"nofollow\">https://github.com/Fabian-Mor/sae-ft</a></p>\n","updatedAt":"2026-05-18T11:56:06.711Z","author":{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","fullname":"Arnas Uselis","name":"Gigglingface","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8523325324058533},"editors":["Gigglingface"],"editorAvatarUrls":["/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.15961","authors":[{"_id":"6a0afe363049bece374a8636","name":"Fabian Morelli","hidden":false},{"_id":"6a0afe363049bece374a8637","name":"Arnas Uselis","hidden":false},{"_id":"6a0afe363049bece374a8638","name":"Ankit Sonthalia","hidden":false},{"_id":"6a0afe363049bece374a8639","name":"Seong Joon Oh","hidden":false}],"publishedAt":"2026-05-15T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models","submittedOnDailyBy":{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","isPro":false,"fullname":"Arnas Uselis","user":"Gigglingface","type":"user","name":"Gigglingface"},"summary":"Large-scale pre-trained vision-language models like CLIP demonstrate remarkable zero-shot performance across diverse tasks. However, fine-tuning these models to improve downstream performance often degrades robustness against distribution shifts. Recent approaches have attempted to mitigate this trade-off, but often rely on computationally expensive text-guidance. We propose a novel method for robust fine-tuning, SAE-FT, which operates only on the model's visual representations. SAE-FT regularizes changes to these representations by penalizing the addition and removal of semantically meaningful features identified by a Sparse Autoencoder trained on the pre-trained model. This constraint prevents catastrophic forgetting and makes the fine-tuning process interpretable, enabling direct analysis of semantic changes. SAE-FT is both mechanistically transparent and computationally efficient, matching or exceeding state-of-the-art performance on ImageNet and its associated distribution shift benchmarks. Code is publicly available at: https://github.com/Fabian-Mor/sae-ft.","upvotes":5,"discussionId":"6a0afe363049bece374a863a","githubRepo":"https://github.com/Fabian-Mor/sae-ft","githubRepoAddedBy":"user","ai_summary":"SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts.","ai_keywords":["vision-language models","CLIP","fine-tuning","distribution shifts","Sparse Autoencoder","visual representations","catastrophic forgetting","semantic features","mechanistic transparency","computational efficiency"],"githubStars":1},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","isPro":false,"fullname":"Arnas Uselis","user":"Gigglingface","type":"user"},{"_id":"67261fce31376d3bace1b77f","avatarUrl":"/avatars/dcdc5774520d349838887a710b6d61be.svg","isPro":false,"fullname":"Fabian Morelli","user":"fabianmorelli","type":"user"},{"_id":"6996f263368ec9060e706bc2","avatarUrl":"/avatars/77298387834c5b5ba61feb298f79f7bb.svg","isPro":false,"fullname":"Otysrlzupx5d","user":"otysrlzupx5d","type":"user"},{"_id":"64fdf8f931a82e0d40769869","avatarUrl":"/avatars/293c936282e9f59b67928ad9957ccdd9.svg","isPro":false,"fullname":"Ankit","user":"aktsonthalia2","type":"user"},{"_id":"658154bf1d82e2fdf5e45791","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/658154bf1d82e2fdf5e45791/6X9BeMavBcRvmB3LDus6Y.jpeg","isPro":false,"fullname":"JUAN","user":"AlexandreOfficiel","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.15961.md"}">

Papers

arxiv:2605.15961

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

Published on May 15

· Submitted by

Arnas Uselis on May 18

Upvote

Authors:

Abstract

SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts.

AI-generated summary

Large-scale pre-trained vision-language models like CLIP demonstrate remarkable zero-shot performance across diverse tasks. However, fine-tuning these models to improve downstream performance often degrades robustness against distribution shifts. Recent approaches have attempted to mitigate this trade-off, but often rely on computationally expensive text-guidance. We propose a novel method for robust fine-tuning, SAE-FT, which operates only on the model's visual representations. SAE-FT regularizes changes to these representations by penalizing the addition and removal of semantically meaningful features identified by a Sparse Autoencoder trained on the pre-trained model. This constraint prevents catastrophic forgetting and makes the fine-tuning process interpretable, enabling direct analysis of semantic changes. SAE-FT is both mechanistically transparent and computationally efficient, matching or exceeding state-of-the-art performance on ImageNet and its associated distribution shift benchmarks. Code is publicly available at: https://github.com/Fabian-Mor/sae-ft.

View arXiv page View PDF GitHub 1 Add to collection

Community

Gigglingface

Paper submitter about 14 hours ago

Code: https://github.com/Fabian-Mor/sae-ft

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.15961

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.15961 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.15961 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.15961 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers