Hugging Face Daily Papers · · 3 min read

Language-Switching Triggers Take a Latent Detour Through Language Models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Language-switching triggers analysis on a decoder-based model.</p>\n","updatedAt":"2026-05-20T09:07:34.744Z","author":{"_id":"622a058138f0b01c1c2b33c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/622a058138f0b01c1c2b33c9/fZ2T_BJU9gbXGuxgbZ_OI.jpeg","fullname":"Francis Kulumba","name":"Madjakul","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7829610705375671},"editors":["Madjakul"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/622a058138f0b01c1c2b33c9/fZ2T_BJU9gbXGuxgbZ_OI.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.18646","authors":[{"_id":"6a0d79600cc88a0d483d3744","name":"Francis Kulumba","hidden":false},{"_id":"6a0d79600cc88a0d483d3745","name":"Wissam Antoun","hidden":false},{"_id":"6a0d79600cc88a0d483d3746","name":"Théo Lasnier","hidden":false},{"_id":"6a0d79600cc88a0d483d3747","name":"Benoît Sagot","hidden":false},{"_id":"6a0d79600cc88a0d483d3748","name":"Djamé Seddah","hidden":false}],"publishedAt":"2026-05-18T00:00:00.000Z","submittedOnDailyAt":"2026-05-20T00:00:00.000Z","title":"Language-Switching Triggers Take a Latent Detour Through Language Models","submittedOnDailyBy":{"_id":"622a058138f0b01c1c2b33c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/622a058138f0b01c1c2b33c9/fZ2T_BJU9gbXGuxgbZ_OI.jpeg","isPro":false,"fullname":"Francis Kulumba","user":"Madjakul","type":"user","name":"Madjakul"},"summary":"Backdoor attacks on language models pose a growing security concern, yet the internal mechanisms by which a trigger sequence hijacks model computations remain poorly understood. We identify a circuit underlying a language-switching backdoor in an 8B-parameter autoregressive language model, where a three-word Latin trigger (nine tokens) redirects English output to French. We decompose the circuit into three phases: (1) distributed attention heads at early layers compose the trigger tokens into the last sequence position; (2) the resulting signal propagates through mid-layers in a subspace orthogonal to the model's natural language-identity direction; (3) the MLP at the final layer converts this latent signal into French logits. The entire circuit flows through a serial bottleneck at a single position: corrupting that position at any layer entirely mitigate the trigger but also hinder the model's capabilities. The orthogonal latent encoding suggests that defenses that search for language-like signals in intermediate representations would miss this trigger entirely.","upvotes":2,"discussionId":"6a0d79600cc88a0d483d3749","ai_summary":"A three-word Latin trigger in an 8B-parameter language model redirects English output to French through a circuit involving attention heads, orthogonal latent subspaces, and final-layer MLP conversion.","ai_keywords":["autoregressive language model","trigger sequence","attention heads","MLP","language-switching backdoor","latent space","orthogonal subspace","intermediate representations"],"organization":{"_id":"602ba30dc4f8038e9a1e0a60","name":"almanach","fullname":"ALMAnaCH (Inria)","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1613472488646-602ba2a739515f8d31237967.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":true,"fullname":"Urro","user":"urroxyz","type":"user"},{"_id":"622a058138f0b01c1c2b33c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/622a058138f0b01c1c2b33c9/fZ2T_BJU9gbXGuxgbZ_OI.jpeg","isPro":false,"fullname":"Francis Kulumba","user":"Madjakul","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"602ba30dc4f8038e9a1e0a60","name":"almanach","fullname":"ALMAnaCH (Inria)","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1613472488646-602ba2a739515f8d31237967.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.18646.md"}">
Papers
arxiv:2605.18646

Language-Switching Triggers Take a Latent Detour Through Language Models

Published on May 18
· Submitted by
Francis Kulumba
on May 20
Authors:
,
,
,
,

Abstract

A three-word Latin trigger in an 8B-parameter language model redirects English output to French through a circuit involving attention heads, orthogonal latent subspaces, and final-layer MLP conversion.

AI-generated summary

Backdoor attacks on language models pose a growing security concern, yet the internal mechanisms by which a trigger sequence hijacks model computations remain poorly understood. We identify a circuit underlying a language-switching backdoor in an 8B-parameter autoregressive language model, where a three-word Latin trigger (nine tokens) redirects English output to French. We decompose the circuit into three phases: (1) distributed attention heads at early layers compose the trigger tokens into the last sequence position; (2) the resulting signal propagates through mid-layers in a subspace orthogonal to the model's natural language-identity direction; (3) the MLP at the final layer converts this latent signal into French logits. The entire circuit flows through a serial bottleneck at a single position: corrupting that position at any layer entirely mitigate the trigger but also hinder the model's capabilities. The orthogonal latent encoding suggests that defenses that search for language-like signals in intermediate representations would miss this trigger entirely.

Community

Paper submitter about 4 hours ago

Language-switching triggers analysis on a decoder-based model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.18646
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.18646 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.18646 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.18646 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers