Hugging Face Daily Papers · May 29, 2026 · 6 min read

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security\n","updatedAt":"2026-05-29T02:51:13.754Z","author":{"_id":"6621f4eb64e84619e578aad6","avatarUrl":"/avatars/b1ad96ee354b999fcafb2998a636609c.svg","fullname":"Dongrui Liu","name":"shenqiorient","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8979578614234924},"editors":["shenqiorient"],"editorAvatarUrls":["/avatars/b1ad96ee354b999fcafb2998a636609c.svg"],"reactions":[],"isReport":false}},{"id":"6a1a4080ca63123d87cefd64","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:42:24.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Orchard: An Open-Source Agentic Modeling Framework](https://huggingface.co/papers/2605.15040) (2026)\n* [Auditing Agent Harness Safety](https://huggingface.co/papers/2605.14271) (2026)\n* [SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety](https://huggingface.co/papers/2605.05704) (2026)\n* [Security Risks in Tool-Enabled AI Agents: A Systematic Analysis of Privileged Execution Environments](https://huggingface.co/papers/2605.09721) (2026)\n* [A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework](https://huggingface.co/papers/2604.23338) (2026)\n* [ADR: An Agentic Detection System for Enterprise Agentic AI Security](https://huggingface.co/papers/2605.17380) (2026)\n* [The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents](https://huggingface.co/papers/2604.10577) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.15040\">Orchard: An Open-Source Agentic Modeling Framework</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.14271\">Auditing Agent Harness Safety</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.05704\">SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.09721\">Security Risks in Tool-Enabled AI Agents: A Systematic Analysis of Privileged Execution Environments</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.23338\">A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.17380\">ADR: An Agentic Detection System for Enterprise Agentic AI Security</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10577\">The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><a href=\"/librarian-bot\">@librarian-bot</a> recommend</code>\n","updatedAt":"2026-05-30T01:42:24.138Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7084611654281616},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29801","authors":[{"_id":"6a18fe4056b4bb14ec65cefc","name":"Dongrui Liu","hidden":false},{"_id":"6a18fe4056b4bb14ec65cefd","name":"Yu Li","hidden":false},{"_id":"6a18fe4056b4bb14ec65cefe","name":"Zhonghao Yang","hidden":false},{"_id":"6a18fe4056b4bb14ec65ceff","name":"Peng Wang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf00","name":"Guanxu Chen","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf01","name":"Yuejin Xie","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf02","name":"Qinghua Mao","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf03","name":"Wanying Qu","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf04","name":"Yanxu Zhu","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf05","name":"Tianyi Zhou","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf06","name":"Leitao Yuan","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf07","name":"Zhijie Zheng","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf08","name":"Qihao Lin","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf09","name":"Yimin Wang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf0a","name":"Haoyu Luo","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf0b","name":"Shuai Shao","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf0c","name":"Chen Qian","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf0d","name":"Qingyu Liu","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf0e","name":"Ling Tang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf0f","name":"Ruiyang Qin","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf10","name":"Qihan Ren","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf11","name":"Junxiao Yang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf12","name":"Kun Wang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf13","name":"Zhiheng Xi","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf14","name":"Linfeng Zhang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf15","name":"Ranjie Duan","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf16","name":"Bo Zhang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf17","name":"Wenjie Wang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf18","name":"Wen Shen","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf19","name":"Qiaosheng Zhang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf1a","name":"Yan Teng","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf1b","name":"Chaochao Lu","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf1c","name":"Rui Mei","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf1d","name":"Man Li","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf1e","name":"Jialing Tao","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf1f","name":"Xi Lin","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf20","name":"Tianhang Zheng","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf21","name":"Yong Liu","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf22","name":"Quanshi Zhang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf23","name":"Lei Zhu","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf24","name":"Xingjun Ma","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf25","name":"Junhua Liu","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf26","name":"Hui Xue","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf27","name":"Xiaoxiang Zuo","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf28","name":"Xiangnan He","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf29","name":"Chao Shen","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf2a","name":"Xianglong Liu","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf2b","name":"Minlie Huang","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf2c","name":"Jing Shao","hidden":false},{"_id":"6a18fe4056b4bb14ec65cf2d","name":"Xia Hu","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security","submittedOnDailyBy":{"_id":"6621f4eb64e84619e578aad6","avatarUrl":"/avatars/b1ad96ee354b999fcafb2998a636609c.svg","isPro":false,"fullname":"Dongrui Liu","user":"shenqiorient","type":"user","name":"shenqiorient"},"summary":"Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lower attack barriers, rendering current agent alignment frameworks inadequate for real-world deployment. To tackle these emerging threats, we propose a lightweight and scalable agent safety alignment framework. Specifically, we update the agent safety taxonomy to accommodate emergent risks from Codex and OpenClaw execution scenarios. We further build a taxonomy-guided data engine with influence-function purification to train lightweight AgentDoG 1.5 variants (0.8B, 2B, 4B, and 8B parameters) using only around 1k samples, achieving comparable performance with leading closed-source models (e.g., GPT-5.4). Based on AgentDoG 1.5, we construct a highly efficient agentic safety SFT and RL training environment, which reduces deployment overhead in Docker-level environments by two orders of magnitude. Finally, we deploy AgentDoG 1.5 as a training-free online guardrail for real-time safety moderation. Extensive experimental results indicate that AgentDoG 1.5 achieves state-of-the-art performance in diverse and complex interactive agentic scenarios. All models and datasets are openly released.","upvotes":104,"discussionId":"6a18fe4056b4bb14ec65cf2e","projectPage":"https://ai45lab.github.io/AgentDoG/v1_5/","ai_summary":"A lightweight and scalable agent safety alignment framework is proposed to address emerging threats from advanced AI models, featuring taxonomy-guided training with minimal samples and efficient deployment in real-world scenarios.","ai_keywords":["agent safety alignment framework","agent safety taxonomy","influence-function purification","AgentDoG 1.5","agentic safety SFT","RL training environment","Docker-level environments","online guardrail","real-time safety moderation","interactive agentic scenarios"],"organization":{"_id":"6747ee5decec679eafb90450","name":"ShanghaiAiLab","fullname":"shanghai ailab ","avatar":"https://www.gravatar.com/avatar/6cd2acf412ad103653d9ce14a1aacc19?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6621f4eb64e84619e578aad6","avatarUrl":"/avatars/b1ad96ee354b999fcafb2998a636609c.svg","isPro":false,"fullname":"Dongrui Liu","user":"shenqiorient","type":"user"},{"_id":"68b4286d2511563aba883bf4","avatarUrl":"/avatars/694b76207ed8483c297bd1427d724761.svg","isPro":false,"fullname":"Qihao Lin","user":"lqh201106","type":"user"},{"_id":"6763f633e90f0e38c79a69cf","avatarUrl":"/avatars/125c3f93229a3a794863723ac0fa9088.svg","isPro":false,"fullname":"Frank Chen","user":"quantumfr","type":"user"},{"_id":"664c85f7dbdb31053d91681d","avatarUrl":"/avatars/029c44aefb90c0c56c139bb8cd151ae5.svg","isPro":false,"fullname":"Qinghua Mao","user":"maopopovich","type":"user"},{"_id":"682844d0f300562abd28e0c9","avatarUrl":"/avatars/8cd5e929cfa19331f38a6f9c97f841b0.svg","isPro":false,"fullname":"dsadsa","user":"yueyue0407","type":"user"},{"_id":"643dfd235aafbdca3a5792c0","avatarUrl":"/avatars/ce8553cf5936012c692e08054ee27937.svg","isPro":false,"fullname":"Bo Zhang","user":"BoZhang","type":"user"},{"_id":"64741e806a972f252de629cc","avatarUrl":"/avatars/597e83bed563aea9d830da90ac1baa68.svg","isPro":false,"fullname":"SII-Qiaosheng Zhang","user":"EricZhang7851","type":"user"},{"_id":"69bbf6dc278d849bc2e5a01b","avatarUrl":"/avatars/69fe25940779230dca7fcc3a5f5e2d84.svg","isPro":false,"fullname":"tian wang","user":"tianjiao24","type":"user"},{"_id":"64cb54bcbb5d195b99186e15","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64cb54bcbb5d195b99186e15/Z7ziJXjGLXN5rBEYC0sqx.png","isPro":false,"fullname":"SII-Xingjun Ma","user":"xingjunm","type":"user"},{"_id":"67d18a7c312ed7eef068feb9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/z6RVAfS8IZ7SZWUcNvR-v.png","isPro":false,"fullname":"QuWanying","user":"RainbowQTT","type":"user"},{"_id":"6659fe615c3b2634b78572ec","avatarUrl":"/avatars/104ac6a514be366ad6dae71985c769ad.svg","isPro":false,"fullname":"PsychoXiong","user":"PsychoO","type":"user"},{"_id":"667ec2ffa016ced375edfae9","avatarUrl":"/avatars/73f1298569fb0667194d1ac17fb508d6.svg","isPro":false,"fullname":"Ziyun dai","user":"Oliviadzy","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":1,"organization":{"_id":"6747ee5decec679eafb90450","name":"ShanghaiAiLab","fullname":"shanghai ailab ","avatar":"https://www.gravatar.com/avatar/6cd2acf412ad103653d9ce14a1aacc19?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29801.md"}">

Papers

arxiv:2605.29801

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Published on May 28

· Submitted by

Dongrui Liu on May 29

Authors:

Abstract

A lightweight and scalable agent safety alignment framework is proposed to address emerging threats from advanced AI models, featuring taxonomy-guided training with minimal samples and efficient deployment in real-world scenarios.

AI-generated summary

Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lower attack barriers, rendering current agent alignment frameworks inadequate for real-world deployment. To tackle these emerging threats, we propose a lightweight and scalable agent safety alignment framework. Specifically, we update the agent safety taxonomy to accommodate emergent risks from Codex and OpenClaw execution scenarios. We further build a taxonomy-guided data engine with influence-function purification to train lightweight AgentDoG 1.5 variants (0.8B, 2B, 4B, and 8B parameters) using only around 1k samples, achieving comparable performance with leading closed-source models (e.g., GPT-5.4). Based on AgentDoG 1.5, we construct a highly efficient agentic safety SFT and RL training environment, which reduces deployment overhead in Docker-level environments by two orders of magnitude. Finally, we deploy AgentDoG 1.5 as a training-free online guardrail for real-time safety moderation. Extensive experimental results indicate that AgentDoG 1.5 achieves state-of-the-art performance in diverse and complex interactive agentic scenarios. All models and datasets are openly released.