Hugging Face Daily Papers · · 4 min read

ICA Lens: Interpreting Language Models Without Training Another Dictionary

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Many meaningful activation directions are selective: they fire for recurring words, constructions, topics, contexts, or discourse patterns rather than for typical random projections. That selectivity leaves a non-Gaussian footprint. ICA Lens turns this footprint into a practical workflow for finding, inspecting, and testing compact signed directions before reaching for costly learned dictionaries such as large SAEs.</p>\n","updatedAt":"2026-06-11T01:42:19.645Z","author":{"_id":"657eba08647c0211e7b11837","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657eba08647c0211e7b11837/-K4rkFuW_re5oJQUPmHp4.jpeg","fullname":"Feijiang Han","name":"FeijiangHan","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9289206266403198},"editors":["FeijiangHan"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/657eba08647c0211e7b11837/-K4rkFuW_re5oJQUPmHp4.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.11722","authors":[{"_id":"6a2a11df80a9c7c6830c0e67","name":"Sida Liu","hidden":false},{"_id":"6a2a11df80a9c7c6830c0e68","name":"Feijiang Han","hidden":false}],"publishedAt":"2026-06-10T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"ICA Lens: Interpreting Language Models Without Training Another Dictionary","submittedOnDailyBy":{"_id":"657eba08647c0211e7b11837","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657eba08647c0211e7b11837/-K4rkFuW_re5oJQUPmHp4.jpeg","isPro":true,"fullname":"Feijiang Han","user":"FeijiangHan","type":"user","name":"FeijiangHan"},"summary":"Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoders (SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large overcomplete dictionaries. This bottleneck limits rapid exploration and raises a fundamental question: how much interpretable structure is already visible from activation geometry before training another neural dictionary? Our intuition is simple: many interpretable directions are selective on tokens, and these directions should look less Gaussian than random directions. We therefore revisit independent component analysis (ICA), a classical method for finding non-Gaussian directions, as a compact lens for language-model interpretability. We find that ICA has been underestimated for LLM interpretability, because prior uses often relied on off-the-shelf ICA implementations that are brittle on LLM activations and lacked systematic tools for inspecting and evaluating the recovered directions. To bridge these gaps, we introduce ICALens, the first practical workflow for stable, efficient, and auditable ICA analysis of LLM representations. It combines an optimized GPU-parallel FastICA pipeline with LLM-specific stability recipes and better fitting diagnostics, enabling efficient and reliable layer-wise analysis. Across GPT-2 Small, Gemma 2 2B, and Qwen 3.5 2B Base, ICALens efficiently recovers compact, human-interpretable directions without per-layer gradient-based dictionary training. On SAEBench, ICA is competitive with public SAEs in sparse probing and outperforms them in targeted probe perturbation under small-to-medium budgets. These results suggest that ICA should not be viewed as a weak baseline, but as an efficient and complementary first lens for exploring language-model representations.","upvotes":14,"discussionId":"6a2a11e080a9c7c6830c0e69","projectPage":"https://liusida.github.io/ica-lens-paper/","githubRepo":"https://github.com/liusida/ica-lens-paper","githubRepoAddedBy":"user","ai_summary":"Independent component analysis (ICA) is revived as an efficient method for discovering interpretable directions in language model representations, offering a faster alternative to sparse autoencoder training while maintaining competitive performance in probing tasks.","ai_keywords":["sparse autoencoders","independent component analysis","language-model representations","activation geometry","Gaussian directions","ICA","FastICA","LLM interpretability","SAEBench","sparse probing","targeted probe perturbation"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":20,"organization":{"_id":"6a2656bc508d0561fab48d82","name":"EEEAILab","fullname":"Effective, Efficient, and Explainable AI Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/657eba08647c0211e7b11837/VqsdJUXtWhaUxELXlru9R.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"657eba08647c0211e7b11837","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657eba08647c0211e7b11837/-K4rkFuW_re5oJQUPmHp4.jpeg","isPro":true,"fullname":"Feijiang Han","user":"FeijiangHan","type":"user"},{"_id":"609eb77f1b2369e005b15629","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/609eb77f1b2369e005b15629/2U0Cv5a75NC-ns1IZqV2G.png","isPro":false,"fullname":"Sida Liu","user":"sida","type":"user"},{"_id":"62cd3a3691d27e60db0698b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cd3a3691d27e60db0698b0/eKh813jAE6g3HbzpzMpCb.jpeg","isPro":false,"fullname":"Wenbo Pan","user":"wenbopan","type":"user"},{"_id":"67cee21329143cf9ea714e55","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/soEd3qhTb5vzrw-WKJC6D.jpeg","isPro":false,"fullname":"Bohan Sun","user":"Bohansss","type":"user"},{"_id":"64672f4aa9b4610868a0407e","avatarUrl":"/avatars/2e964e7edd2d1ebaa2b8597992090925.svg","isPro":false,"fullname":"Yuqin Yang","user":"CRIS-Yang","type":"user"},{"_id":"667187ba9ab144eb3ac43a1b","avatarUrl":"/avatars/db5558aa1c5160b9aee8b58573271959.svg","isPro":false,"fullname":"Runze Liu","user":"RyanLiu112","type":"user"},{"_id":"695385bfb3872f84a1f807a9","avatarUrl":"/avatars/16187a3a5d15e703bcf3063bdde328df.svg","isPro":false,"fullname":"ChristianSum","user":"xixhn","type":"user"},{"_id":"6953870127f9d6b3746aba5c","avatarUrl":"/avatars/37f9b999e5ffb5f693373dc9621a0b18.svg","isPro":false,"fullname":"Montgomery","user":"MMMicheil","type":"user"},{"_id":"69538b1b95778588fda50508","avatarUrl":"/avatars/7c285373b73914a3aaac50c6453503dd.svg","isPro":false,"fullname":"xmufqk","user":"LFQ4XMU","type":"user"},{"_id":"69538df16d2ff5cb6dc737f6","avatarUrl":"/avatars/187b8b1bd81b16f807faebbd4897406d.svg","isPro":false,"fullname":"JeremyFrankl","user":"JFKisme","type":"user"},{"_id":"695387a8a6ebf89c814f2b77","avatarUrl":"/avatars/b684d44d4cd5ce02b3eb6429ff70c76a.svg","isPro":false,"fullname":"CaitlynJordan","user":"CM-SJC","type":"user"},{"_id":"695388d4663d1795c74fc1ae","avatarUrl":"/avatars/2737fd731dad7a862464393ae60de469.svg","isPro":false,"fullname":"LouisPalmer","user":"Lotus487","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6a2656bc508d0561fab48d82","name":"EEEAILab","fullname":"Effective, Efficient, and Explainable AI Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/657eba08647c0211e7b11837/VqsdJUXtWhaUxELXlru9R.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.11722.md"}">
Papers
arxiv:2606.11722

ICA Lens: Interpreting Language Models Without Training Another Dictionary

Published on Jun 10
· Submitted by
Feijiang Han
on Jun 11
Authors:
,

Abstract

Independent component analysis (ICA) is revived as an efficient method for discovering interpretable directions in language model representations, offering a faster alternative to sparse autoencoder training while maintaining competitive performance in probing tasks.

Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoders (SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large overcomplete dictionaries. This bottleneck limits rapid exploration and raises a fundamental question: how much interpretable structure is already visible from activation geometry before training another neural dictionary? Our intuition is simple: many interpretable directions are selective on tokens, and these directions should look less Gaussian than random directions. We therefore revisit independent component analysis (ICA), a classical method for finding non-Gaussian directions, as a compact lens for language-model interpretability. We find that ICA has been underestimated for LLM interpretability, because prior uses often relied on off-the-shelf ICA implementations that are brittle on LLM activations and lacked systematic tools for inspecting and evaluating the recovered directions. To bridge these gaps, we introduce ICALens, the first practical workflow for stable, efficient, and auditable ICA analysis of LLM representations. It combines an optimized GPU-parallel FastICA pipeline with LLM-specific stability recipes and better fitting diagnostics, enabling efficient and reliable layer-wise analysis. Across GPT-2 Small, Gemma 2 2B, and Qwen 3.5 2B Base, ICALens efficiently recovers compact, human-interpretable directions without per-layer gradient-based dictionary training. On SAEBench, ICA is competitive with public SAEs in sparse probing and outperforms them in targeted probe perturbation under small-to-medium budgets. These results suggest that ICA should not be viewed as a weak baseline, but as an efficient and complementary first lens for exploring language-model representations.

Community

Many meaningful activation directions are selective: they fire for recurring words, constructions, topics, contexts, or discourse patterns rather than for typical random projections. That selectivity leaves a non-Gaussian footprint. ICA Lens turns this footprint into a practical workflow for finding, inspecting, and testing compact signed directions before reaching for costly learned dictionaries such as large SAEs.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.11722
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.11722 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers