Hugging Face Daily Papers · · 5 min read

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

<strong>Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders</strong></p>\n<p>Whisper has a well-known failure mode: feed it silence, noise, or music, and it will often respond with confidently fabricated transcripts. This paper shows you can detect and mitigate these hallucinations purely from the model's internal activations without fine-tuning.</p>\n<p>We probe two representation spaces in Whisper's audio encoder: raw activations and Sparse AutoEncoder (SAE) latents. Both turn out to encode linearly separable hallucination signals, concentrated in a sparse subset of features that strengthen in deeper layers. Steering activations away from these directions at inference yields large drops in hallucination rate on non-speech samples from different datasets:</p>\n<ul>\n<li>Whisper small: <strong>72.63% → 14.11%</strong> hallucination rate on non-speech samples</li>\n<li>Whisper large-v3: <strong>86.88% → 27.33%</strong></li>\n</ul>\n<p>WER on regular speech data barely budges, and the method reaches numbers competitive with fine-tuning approaches like Calm-Whisper, without touching any model weights. A finding worth highlighting: since steering only a handful of encoder-side SAE features is enough to suppress hallucinations, the hallucination signal is not purely a decoder-side generation issue, it is already encoded in Whisper's encoder representations of non-speech audio.</p>\n<p>Paper: <a href=\"https://arxiv.org/abs/2606.07473\" rel=\"nofollow\">https://arxiv.org/abs/2606.07473</a></p>\n","updatedAt":"2026-06-09T11:14:17.964Z","author":{"_id":"660fd34df03515e4ff3f2b64","avatarUrl":"/avatars/0c2a29b1081ece881234acdd8ef9371a.svg","fullname":"Georgii Aparin","name":"Egorgij21","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.8684939742088318},"editors":["Egorgij21"],"editorAvatarUrls":["/avatars/0c2a29b1081ece881234acdd8ef9371a.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.07473","authors":[{"_id":"6a27ecb1770b2a72dc788815","user":{"_id":"660fd34df03515e4ff3f2b64","avatarUrl":"/avatars/0c2a29b1081ece881234acdd8ef9371a.svg","isPro":false,"fullname":"Georgii Aparin","user":"Egorgij21","type":"user","name":"Egorgij21"},"name":"Georgii Aparin","status":"claimed_verified","statusLastChangedAt":"2026-06-09T12:40:22.454Z","hidden":false},{"_id":"6a27ecb1770b2a72dc788816","name":"Vadim Popov","hidden":false},{"_id":"6a27ecb1770b2a72dc788817","name":"Tasnima Sadekova","hidden":false},{"_id":"6a27ecb1770b2a72dc788818","name":"Assel Yermekova","hidden":false}],"publishedAt":"2026-06-05T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders","submittedOnDailyBy":{"_id":"660fd34df03515e4ff3f2b64","avatarUrl":"/avatars/0c2a29b1081ece881234acdd8ef9371a.svg","isPro":false,"fullname":"Georgii Aparin","user":"Egorgij21","type":"user","name":"Egorgij21"},"summary":"Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and mitigated through Whisper's internal representations. We extract audio encoder activations and evaluate two representation spaces: raw Whisper activations and Sparse AutoEncoder (SAE) latents. We show that both spaces encode linearly separable hallucination-related information, with discriminative power concentrated in a sparse feature subset and increasing toward deeper encoder layers. We propose two steering strategies: activation-space steering and SAE latent-space steering. SAE-based steering reduces hallucination rate from 72.63% to 14.11% for Whisper small and from 86.88% to 27.33% for Whisper large-v3 on the full non-speech test set, with small WER degradation on speech data, approaching the performance of fine-tuning-based methods.","upvotes":9,"discussionId":"6a27ecb2770b2a72dc788819","ai_summary":"Research demonstrates that hallucinations in Whisper ASR can be detected and reduced using internal representations from audio encoder activations and Sparse AutoEncoder latents, achieving significant hallucination rate reduction with minimal speech transcription degradation.","ai_keywords":["Whisper","ASR model","hallucinations","audio encoder activations","Sparse AutoEncoder","SAE latents","representation spaces","linear separability","activation-space steering","SAE latent-space steering"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"5f83c275f0801648bf88454a","name":"huawei-noah","fullname":"HUAWEI Noah's Ark Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1602470452594-5f83c19ff0801648bf884549.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"660fd34df03515e4ff3f2b64","avatarUrl":"/avatars/0c2a29b1081ece881234acdd8ef9371a.svg","isPro":false,"fullname":"Georgii Aparin","user":"Egorgij21","type":"user"},{"_id":"668e3e02d501232e63a75778","avatarUrl":"/avatars/fd8b93b61d3035520e4f2cf56709831b.svg","isPro":false,"fullname":"Tasnima","user":"str12","type":"user"},{"_id":"63177d85f957903db971a173","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665094764329-63177d85f957903db971a173.png","isPro":false,"fullname":"Artem","user":"kabachuha","type":"user"},{"_id":"636254dc2691058b19d9276a","avatarUrl":"/avatars/36eb0e27e0e321fb0ac513f0d4d67c95.svg","isPro":false,"fullname":"Kushnareva","user":"Kushnareva","type":"user"},{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":false,"fullname":"Urro","user":"urroxyz","type":"user"},{"_id":"67e01c7b61912a8c2757896a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/h9Ve9nm0oZmDNOmvjEoB6.png","isPro":false,"fullname":"Alexander Topolnitskii","user":"insightofspb","type":"user"},{"_id":"6904b6c9e8e725a358c1d86a","avatarUrl":"/avatars/f4bb7ff924805e54b5d87f89df6ec66d.svg","isPro":false,"fullname":"kristian kuznetsov","user":"pyashy","type":"user"},{"_id":"65b68acc7ccceb5ece8efdba","avatarUrl":"/avatars/c7e0e5f852b5e746ecb15f205e021e08.svg","isPro":false,"fullname":"Vladislav Pedashenko","user":"candelabrum","type":"user"},{"_id":"6521793406fa58638b588c87","avatarUrl":"/avatars/cbdb428e45a18f262ff8b07405274db4.svg","isPro":false,"fullname":"Dmitrii Tarasov","user":"mrsndmn","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"5f83c275f0801648bf88454a","name":"huawei-noah","fullname":"HUAWEI Noah's Ark Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1602470452594-5f83c19ff0801648bf884549.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.07473.md"}">
Papers
arxiv:2606.07473

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Published on Jun 5
· Submitted by
Georgii Aparin
on Jun 9
Authors:
,
,

Abstract

Research demonstrates that hallucinations in Whisper ASR can be detected and reduced using internal representations from audio encoder activations and Sparse AutoEncoder latents, achieving significant hallucination rate reduction with minimal speech transcription degradation.

Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and mitigated through Whisper's internal representations. We extract audio encoder activations and evaluate two representation spaces: raw Whisper activations and Sparse AutoEncoder (SAE) latents. We show that both spaces encode linearly separable hallucination-related information, with discriminative power concentrated in a sparse feature subset and increasing toward deeper encoder layers. We propose two steering strategies: activation-space steering and SAE latent-space steering. SAE-based steering reduces hallucination rate from 72.63% to 14.11% for Whisper small and from 86.88% to 27.33% for Whisper large-v3 on the full non-speech test set, with small WER degradation on speech data, approaching the performance of fine-tuning-based methods.

Community

Paper author Paper submitter about 8 hours ago
edited about 8 hours ago

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Whisper has a well-known failure mode: feed it silence, noise, or music, and it will often respond with confidently fabricated transcripts. This paper shows you can detect and mitigate these hallucinations purely from the model's internal activations without fine-tuning.

We probe two representation spaces in Whisper's audio encoder: raw activations and Sparse AutoEncoder (SAE) latents. Both turn out to encode linearly separable hallucination signals, concentrated in a sparse subset of features that strengthen in deeper layers. Steering activations away from these directions at inference yields large drops in hallucination rate on non-speech samples from different datasets:

  • Whisper small: 72.63% → 14.11% hallucination rate on non-speech samples
  • Whisper large-v3: 86.88% → 27.33%

WER on regular speech data barely budges, and the method reaches numbers competitive with fine-tuning approaches like Calm-Whisper, without touching any model weights. A finding worth highlighting: since steering only a handful of encoder-side SAE features is enough to suppress hallucinations, the hallucination signal is not purely a decoder-side generation issue, it is already encoded in Whisper's encoder representations of non-speech audio.

Paper: https://arxiv.org/abs/2606.07473

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.07473
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.07473 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.07473 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.07473 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers