Hugging Face Daily Papers · · 4 min read

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Grammar-constrained decoding techniques used to ensure syntactic validity in code generation can be exploited as an attack surface, leading to the development of a jailbreak method called CodeSpear and a safety alignment approach named CodeShield.</p>\n","updatedAt":"2026-06-11T09:17:33.574Z","author":{"_id":"67e12de06310b0f6ae83a792","avatarUrl":"/avatars/a77b46ef117b9d4d553b194137e7047a.svg","fullname":"zhang","name":"yitong42","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.7017720937728882},"editors":["yitong42"],"editorAvatarUrls":["/avatars/a77b46ef117b9d4d553b194137e7047a.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.11817","authors":[{"_id":"6a2a508480a9c7c6830c1069","user":{"_id":"67e12de06310b0f6ae83a792","avatarUrl":"/avatars/a77b46ef117b9d4d553b194137e7047a.svg","isPro":false,"fullname":"zhang","user":"yitong42","type":"user","name":"yitong42"},"name":"Yitong Zhang","status":"claimed_verified","statusLastChangedAt":"2026-06-11T08:37:49.776Z","hidden":false},{"_id":"6a2a508480a9c7c6830c106a","name":"Shiteng Lu","hidden":false},{"_id":"6a2a508480a9c7c6830c106b","name":"Jia Li","hidden":false}],"publishedAt":"2026-06-10T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code","submittedOnDailyBy":{"_id":"67e12de06310b0f6ae83a792","avatarUrl":"/avatars/a77b46ef117b9d4d553b194137e7047a.svg","isPro":false,"fullname":"zhang","user":"yitong42","type":"user","name":"yitong42"},"summary":"Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding (GCD) has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this paper, we reveal a counterintuitive risk: this reliability-oriented technique can itself become an attack surface. We uncover a new jailbreak attack, termed CodeSpear, that exploits GCD to induce LLMs into generating malicious code. Our experiments show that simply applying a benign code grammar constraint can effectively jailbreak LLMs.\n To address this vulnerability, we propose CodeShield, a safety alignment approach that robustly preserves safe behavior even under attacker-controlled grammar constraints. CodeShield aligns the model in the code modality by teaching it to generate honeypot code under GCD. Such code is semantically harmless, so it does not implement the malicious request, and structurally diverse, so it is difficult to suppress through grammar tightening. At the same time, CodeShield still preserves natural-language refusals when natural language is available. Experiments on 10 popular LLMs across 4 benchmarks show that CodeSpear outperforms representative jailbreak baselines and increases the attack success rate by more than 30 percentage points on average. CodeShield also restores safety under CodeSpear while preserving benign utility. Our findings reveal a fundamental risk of GCD and call for greater attention to its potential security implications.","upvotes":17,"discussionId":"6a2a508480a9c7c6830c106c","githubRepo":"https://github.com/TsinghuaISE/CodeSpear-CodeShield","githubRepoAddedBy":"user","ai_summary":"Grammar-constrained decoding techniques used to ensure syntactic validity in code generation can be exploited as an attack surface, leading to the development of a jailbreak method called CodeSpear and a safety alignment approach named CodeShield.","ai_keywords":["Grammar-Constrained Decoding","jailbreak attack","CodeSpear","CodeShield","LLM-generated code","syntactic validity","safety alignment","honeypot code","semantic harmlessness","structural diversity"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":3},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67e12de06310b0f6ae83a792","avatarUrl":"/avatars/a77b46ef117b9d4d553b194137e7047a.svg","isPro":false,"fullname":"zhang","user":"yitong42","type":"user"},{"_id":"647c554f92182942d7c32d35","avatarUrl":"/avatars/83b4e41bfcc9f2f8d4c2864af924d2af.svg","isPro":false,"fullname":"Jia Li","user":"LJ0815","type":"user"},{"_id":"6944f823c5a9b7be20f82778","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Fjn84j07IlBwXJxPtDm5i.png","isPro":false,"fullname":"Yt Liu","user":"TiffanyYT","type":"user"},{"_id":"68ec8de6219ff4c21a4f4b82","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/ODdg2ccxTGvMEAoGA4Rd1.jpeg","isPro":false,"fullname":"Shiteng Lu","user":"SimonLu0306","type":"user"},{"_id":"6908c5f1a557b2f1d1bbe0c3","avatarUrl":"/avatars/de4e9e97fe62cabc64956a46151aa885.svg","isPro":false,"fullname":"Felix Xia","user":"Felix-txjk","type":"user"},{"_id":"66fd3ed1104850d17b2c4e7c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66fd3ed1104850d17b2c4e7c/Tw6--5JsovuUQ5khJ6t2J.jpeg","isPro":false,"fullname":"Hejun Dong","user":"fickle1101","type":"user"},{"_id":"65b8fa2a987c4142f3d0b562","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/CrOUMMHXMcDCrmCBLJNGj.jpeg","isPro":false,"fullname":"zhrli","user":"ailor","type":"user"},{"_id":"67efc3ebb34bbe2671b24c79","avatarUrl":"/avatars/f32619856d7f5f75cbeb1ea9556560fd.svg","isPro":false,"fullname":"lhz","user":"lhz191","type":"user"},{"_id":"68f5f9083653017cff8f27e2","avatarUrl":"/avatars/9dfe8ba2256be111715163b1a9afd44b.svg","isPro":false,"fullname":"ZhangYiao","user":"Fisherder","type":"user"},{"_id":"66a8e2538407031e388c501f","avatarUrl":"/avatars/d16d51f7b1e111efd6d0985995b614be.svg","isPro":false,"fullname":"wjj","user":"wuyuverse","type":"user"},{"_id":"6953ec4d6db99efbe4968235","avatarUrl":"/avatars/4590bf4b6cfc1994d35bdb5a151c0925.svg","isPro":false,"fullname":"Haoqi Yu","user":"Kyle-Yu16","type":"user"},{"_id":"67e109eecc5d63b81f9fed66","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/kT_zgl_CUN3SXuXlW-e2M.png","isPro":false,"fullname":"Daiqiang Li","user":"lidaiqiang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.11817.md"}">
Papers
arxiv:2606.11817

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Published on Jun 10
· Submitted by
zhang
on Jun 11
Authors:
,

Abstract

Grammar-constrained decoding techniques used to ensure syntactic validity in code generation can be exploited as an attack surface, leading to the development of a jailbreak method called CodeSpear and a safety alignment approach named CodeShield.

Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding (GCD) has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this paper, we reveal a counterintuitive risk: this reliability-oriented technique can itself become an attack surface. We uncover a new jailbreak attack, termed CodeSpear, that exploits GCD to induce LLMs into generating malicious code. Our experiments show that simply applying a benign code grammar constraint can effectively jailbreak LLMs. To address this vulnerability, we propose CodeShield, a safety alignment approach that robustly preserves safe behavior even under attacker-controlled grammar constraints. CodeShield aligns the model in the code modality by teaching it to generate honeypot code under GCD. Such code is semantically harmless, so it does not implement the malicious request, and structurally diverse, so it is difficult to suppress through grammar tightening. At the same time, CodeShield still preserves natural-language refusals when natural language is available. Experiments on 10 popular LLMs across 4 benchmarks show that CodeSpear outperforms representative jailbreak baselines and increases the attack success rate by more than 30 percentage points on average. CodeShield also restores safety under CodeSpear while preserving benign utility. Our findings reveal a fundamental risk of GCD and call for greater attention to its potential security implications.

Community

Paper author Paper submitter about 11 hours ago
edited about 11 hours ago

Grammar-constrained decoding techniques used to ensure syntactic validity in code generation can be exploited as an attack surface, leading to the development of a jailbreak method called CodeSpear and a safety alignment approach named CodeShield.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.11817
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.11817 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.11817 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.11817 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers