arXiv — NLP / Computation & Language · May 13, 2026 · 1 min read

Mechanistic Interpretability of ASR models using Sparse Autoencoders

Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.

Like Read original ↗

arXiv:2605.12225v1 Announce Type: new Abstract: Understanding the internal machinations of deep Transformer-based NLP models is more crucial than ever as these models see widespread use in various domains that affect the public at large, such as industry, academia, finance, health. While these models have advanced rapidly, their internal mechanisms remain largely a mystery. Techniques such as Sparse Autoencoders (SAE) have emerged to understand these mechanisms by projecting dense representations into a sparse vector. While existing research has demonstrated the viability of the SAE in interpreting text-based Large Language Models (LLMs), there are no equivalent studies that demonstrate the application of a SAE to audio processing models like Automatic Speech Recognizers (ASRs). In this work, a SAE is applied to Whisper, a Transformer-based ASR, training a high-dimensional sparse latent space on frame-level embeddings extracted from the Whisper encoder. Our work uncovers diverse monosemantic features across linguistic and non-linguistic boundaries, and demonstrates cross-lingual feature steering. This work establishes the viability of a SAE model and demonstrates that Whisper encodes a rich amount of linguistic information.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from arXiv — NLP / Computation & Language