Hugging Face Daily Papers · · 4 min read

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

We introduce SkillHarm, a benchmark of skill-based attacks across the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses.</p>\n","updatedAt":"2026-06-10T16:14:57.598Z","author":{"_id":"65ace92f64c9b93eca5c2bce","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ace92f64c9b93eca5c2bce/pG0JRXH-8zEy0IoaEnMNw.jpeg","fullname":"Yuting Ning","name":"nnnyt","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8907995223999023},"editors":["nnnyt"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65ace92f64c9b93eca5c2bce/pG0JRXH-8zEy0IoaEnMNw.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.02540","authors":[{"_id":"6a1e47ab808ddbc3c7d43c81","name":"Yuting Ning","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c82","name":"Zhehao Zhang","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c83","name":"Yash Kumar Lal","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c84","name":"Boyu Gou","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c85","name":"Junyi Li","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c86","name":"Weitong Ruan","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c87","name":"Chentao Ye","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c88","name":"Rahul Gupta","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c89","name":"Diyi Yang","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c8a","name":"Yu Su","hidden":false},{"_id":"6a1e47ab808ddbc3c7d43c8b","name":"Huan Sun","hidden":false}],"publishedAt":"2026-06-01T00:00:00.000Z","submittedOnDailyAt":"2026-06-10T00:00:00.000Z","title":"SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction","submittedOnDailyBy":{"_id":"65ace92f64c9b93eca5c2bce","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ace92f64c9b93eca5c2bce/pG0JRXH-8zEy0IoaEnMNw.jpeg","isPro":false,"fullname":"Yuting Ning","user":"nnnyt","type":"user","name":"nnnyt"},"summary":"Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced by skill-based attacks, but they primarily evaluate poisoned skills within a single task execution and enumerate harms through ad-hoc risk lists. To bridge these gaps, we introduce SkillHarm, a benchmark of skill-based attacks across the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. SkillHarm evaluates two attack scenarios: Fixed-Payload Poisoning (FPP), where a fixed poisoned skill package directly compromises any task session that invokes it, and Self-Mutating Poisoning (SMP), where an initially benign execution silently mutates persistent skill content, deferring harm until a subsequent reuse. It further defines 12 risk types based on the agent workflow component targeted by the harm: data pipelines, system environments, and agent autonomy. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses. The resulting benchmark contains 879 attack samples across 71 skills. Experiments show that current agents remain vulnerable with attack success rates up to 86.3% in FPP and 69.3% in SMP. Our analysis further reveals a latent risk: many apparent attack failures stem from the agent failing to engage with the poisoned file rather than genuine resistance, and current defenses still fail to reliably mitigate the threat.","upvotes":9,"discussionId":"6a1e47ab808ddbc3c7d43c8c","projectPage":"https://osu-nlp-group.github.io/SkillHarm/","githubRepo":"https://github.com/OSU-NLP-Group/SkillHarm","githubRepoAddedBy":"user","ai_summary":"SkillHarm is a benchmark for evaluating skill-based attacks across the skill-use lifecycle, demonstrating significant vulnerabilities in current agents with attack success rates up to 86.3%.","ai_keywords":["skill-based attacks","agent workflow","attack scenarios","Fixed-Payload Poisoning","Self-Mutating Poisoning","risk taxonomy","attack samples","attack success rates"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":4,"organization":{"_id":"6127b4827dcb442c226129da","name":"osunlp","fullname":"OSU NLP Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6477a323dbc2a416f8b852b3/oiPPBo_knuDrz0YN9slKj.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6079a5d6489fc71534e91bf5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6079a5d6489fc71534e91bf5/n4rnBWH74GSGuCV9yiN_w.jpeg","isPro":false,"fullname":"Yash Kumar Lal","user":"ykl7","type":"user"},{"_id":"65ace92f64c9b93eca5c2bce","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ace92f64c9b93eca5c2bce/pG0JRXH-8zEy0IoaEnMNw.jpeg","isPro":false,"fullname":"Yuting Ning","user":"nnnyt","type":"user"},{"_id":"671002fd13203512e7b8f9e3","avatarUrl":"/avatars/313d8ea313ed300750cfdaaca44fdb6e.svg","isPro":false,"fullname":"Zhongyang Li","user":"Lzy01241010","type":"user"},{"_id":"62d65139667051e0a29bffe7","avatarUrl":"/avatars/0252aa2bcd4cf1c8e4b87e5f164b6da5.svg","isPro":false,"fullname":"Jian Xie","user":"hsaest","type":"user"},{"_id":"6745089cc681f914069f42a1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6745089cc681f914069f42a1/az6adRBIs8grHd0koJV1A.jpeg","isPro":false,"fullname":"Zanming Huang","user":"huangtom","type":"user"},{"_id":"60a4ebfbaa9320dbbe69e37c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60a4ebfbaa9320dbbe69e37c/QLaEohXCWaUy8YX3wKQ_w.jpeg","isPro":false,"fullname":"Yiheng Shu","user":"yhshu","type":"user"},{"_id":"649e381727145c446313875f","avatarUrl":"/avatars/79536ea5489e4505166be59cb98d72af.svg","isPro":false,"fullname":"Jianyang Gu","user":"vimar","type":"user"},{"_id":"637029f831af06da86518bc4","avatarUrl":"/avatars/b569b77e7f261ef5dc0b072fed61a5ba.svg","isPro":false,"fullname":"Jaylen Jones ","user":"jjones62202","type":"user"},{"_id":"63d19365b30415240fd6515b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d19365b30415240fd6515b/eOEYSsyDTfPTDrR6Cm5Jn.jpeg","isPro":false,"fullname":"Chan Hee Song","user":"chanhee-luke","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6127b4827dcb442c226129da","name":"osunlp","fullname":"OSU NLP Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6477a323dbc2a416f8b852b3/oiPPBo_knuDrz0YN9slKj.png"}}">
Papers
arxiv:2606.02540

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Published on Jun 1
· Submitted by
Yuting Ning
on Jun 10
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

SkillHarm is a benchmark for evaluating skill-based attacks across the skill-use lifecycle, demonstrating significant vulnerabilities in current agents with attack success rates up to 86.3%.

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced by skill-based attacks, but they primarily evaluate poisoned skills within a single task execution and enumerate harms through ad-hoc risk lists. To bridge these gaps, we introduce SkillHarm, a benchmark of skill-based attacks across the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. SkillHarm evaluates two attack scenarios: Fixed-Payload Poisoning (FPP), where a fixed poisoned skill package directly compromises any task session that invokes it, and Self-Mutating Poisoning (SMP), where an initially benign execution silently mutates persistent skill content, deferring harm until a subsequent reuse. It further defines 12 risk types based on the agent workflow component targeted by the harm: data pipelines, system environments, and agent autonomy. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses. The resulting benchmark contains 879 attack samples across 71 skills. Experiments show that current agents remain vulnerable with attack success rates up to 86.3% in FPP and 69.3% in SMP. Our analysis further reveals a latent risk: many apparent attack failures stem from the agent failing to engage with the poisoned file rather than genuine resistance, and current defenses still fail to reliably mitigate the threat.

Community

Paper submitter about 6 hours ago

We introduce SkillHarm, a benchmark of skill-based attacks across the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.02540 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.02540 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers