LLM agents increasingly run inside execution harnesses that dispatch tools, allocate resources, and route messages between specialized components. However, a harness can return a correct, benign answer over a trajectory that accesses unauthorized resources or leaks context to the wrong agent. Output-level evaluation cannot see these failures, yet most safety benchmarks score only final outputs or terminal states, even though many violations occur mid-trajectory rather than at termination. The central question is whether the harness respects user intent, permission boundaries, and information-flow constraints throughout execution. To address this gap, we propose HarnessAudit, a framework that audits full execution trajectories across boundary compliance, execution fidelity, and system stability, with a focus on multi-agent harnesses where these risks are most pronounced. We further introduce HarnessAudit-Bench, a benchmark of 210 tasks across eight real-world domains, instantiated in both single-agent and multi-agent configurations with embedded safety constraints. Evaluating ten harness configurations across frontier models and three multi-agent frameworks, we find that: (i) task completion is misaligned with safe execution, and violations accumulate with trajectory length; (ii) safety risks vary across domains, task types, and agent roles; (iii) most violations concentrate in resource access and inter-agent information transfer; (iv) multi-agent collaboration expands the safety risk surface, while harness design sets the upper bound of safe deployment.</p>\n","updatedAt":"2026-05-18T23:49:16.854Z","author":{"_id":"65e71f6bcd3df9b0f6b2678b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65e71f6bcd3df9b0f6b2678b/8mdex6eI80TGezot_GGI3.jpeg","fullname":"Chengzhi Liu","name":"LCZZZZ","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8980494141578674},"editors":["LCZZZZ"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65e71f6bcd3df9b0f6b2678b/8mdex6eI80TGezot_GGI3.jpeg"],"reactions":[],"isReport":false}},{"id":"6a0bc11c6ad1d59d44c2ce4c","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false},"createdAt":"2026-05-19T01:47:08.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents](https://huggingface.co/papers/2604.10577) (2026)\n* [Toward a Principled Framework for Agent Safety Measurement](https://huggingface.co/papers/2605.01644) (2026)\n* [AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems](https://huggingface.co/papers/2605.08715) (2026)\n* [Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values](https://huggingface.co/papers/2605.10365) (2026)\n* [Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems](https://huggingface.co/papers/2605.10481) (2026)\n* [HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark](https://huggingface.co/papers/2604.13954) (2026)\n* [AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents](https://huggingface.co/papers/2605.13357) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.10577\">The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.01644\">Toward a Principled Framework for Agent Safety Measurement</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.08715\">AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.10365\">Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.10481\">Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.13954\">HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.13357\">AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-19T01:47:08.780Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7020947337150574},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14271","authors":[{"_id":"6a0ba5618ca2d0b256380288","name":"Chengzhi Liu","hidden":false},{"_id":"6a0ba5618ca2d0b256380289","name":"Yichen Guo","hidden":false},{"_id":"6a0ba5618ca2d0b25638028a","name":"Yepeng Liu","hidden":false},{"_id":"6a0ba5618ca2d0b25638028b","name":"Yuzhe Yang","hidden":false},{"_id":"6a0ba5618ca2d0b25638028c","name":"Qianqi Yan","hidden":false},{"_id":"6a0ba5618ca2d0b25638028d","name":"Xuandong Zhao","hidden":false},{"_id":"6a0ba5618ca2d0b25638028e","name":"Wenyue Hua","hidden":false},{"_id":"6a0ba5618ca2d0b25638028f","name":"Sheng Liu","hidden":false},{"_id":"6a0ba5618ca2d0b256380290","name":"Sharon Li","hidden":false},{"_id":"6a0ba5618ca2d0b256380291","name":"Yuheng Bu","hidden":false},{"_id":"6a0ba5618ca2d0b256380292","name":"Xin Eric Wang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/65e71f6bcd3df9b0f6b2678b/Kt_Tk50CWg7TdrO1kr1VG.png","https://cdn-uploads.huggingface.co/production/uploads/65e71f6bcd3df9b0f6b2678b/5l8HJUcV1dc-V2PddXG_U.mp4"],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"Auditing Agent Harness Safety","submittedOnDailyBy":{"_id":"65e71f6bcd3df9b0f6b2678b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65e71f6bcd3df9b0f6b2678b/8mdex6eI80TGezot_GGI3.jpeg","isPro":true,"fullname":"Chengzhi Liu","user":"LCZZZZ","type":"user","name":"LCZZZZ"},"summary":"LLM agents increasingly run inside execution harnesses that dispatch tools, allocate resources, and route messages between specialized components. However, a harness can return a correct, benign answer over a trajectory that accesses unauthorized resources or leaks context to the wrong agent. Output-level evaluation cannot see these failures, yet most safety benchmarks score only final outputs or terminal states, even though many violations occur mid-trajectory rather than at termination. The central question is whether the harness respects user intent, permission boundaries, and information-flow constraints throughout execution. To address this gap, we propose HarnessAudit, a framework that audits full execution trajectories across boundary compliance, execution fidelity, and system stability, with a focus on multi-agent harnesses where these risks are most pronounced. We further introduce HarnessAudit-Bench, a benchmark of 210 tasks across eight real-world domains, instantiated in both single-agent and multi-agent configurations with embedded safety constraints. Evaluating ten harness configurations across frontier models and three multi-agent frameworks, we find that: (i) task completion is misaligned with safe execution, and violations accumulate with trajectory length; (ii) safety risks vary across domains, task types, and agent roles; (iii) most violations concentrate in resource access and inter-agent information transfer; and (iv) multi-agent collaboration expands the safety risk surface, while harness design sets the upper bound of safe deployment.","upvotes":7,"discussionId":"6a0ba5618ca2d0b256380293","projectPage":"https://harnessaudit.github.io/","githubRepo":"https://github.com/eric-ai-lab/HarnessAudit","githubRepoAddedBy":"user","ai_summary":"LLM agents executing within execution harnesses can produce correct outputs while violating safety constraints during execution, necessitating trajectory-level auditing to ensure proper resource access and information flow across multi-agent systems.","ai_keywords":["execution harnesses","tool dispatching","resource allocation","multi-agent systems","safety benchmarks","trajectory auditing","boundary compliance","information-flow constraints","HarnessAudit","HarnessAudit-Bench"],"githubStars":2,"organization":{"_id":"65861edfe3f7a2dcf04230f8","name":"ucsbnlp","fullname":"UC Santa Barbara NLP Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6002c1db698168af3bb9f4a5/WQYUIGXIycUiVr_J5X2n0.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65e71f6bcd3df9b0f6b2678b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65e71f6bcd3df9b0f6b2678b/8mdex6eI80TGezot_GGI3.jpeg","isPro":true,"fullname":"Chengzhi Liu","user":"LCZZZZ","type":"user"},{"_id":"6a0ba64d9f6e27c185835ed0","avatarUrl":"/avatars/e793b8c62a023d7e56e1501c63b50e02.svg","isPro":false,"fullname":"Lczzzzz","user":"Lczzzzz","type":"user"},{"_id":"63f3fc83520c1461892d323e","avatarUrl":"/avatars/bcfe9d170c249492a5e0badaa9ac2325.svg","isPro":false,"fullname":"Yepeng Liu","user":"yepengliu","type":"user"},{"_id":"6a022f0ad8ecb9f4398d7930","avatarUrl":"/avatars/1b6f19891667938c199b9b6f7532b27d.svg","isPro":false,"fullname":"YURAN SUN","user":"LinaSun","type":"user"},{"_id":"68551c9e7b184676f6cb7bca","avatarUrl":"/avatars/596b4f3d29cc89b9c374321d12a24c38.svg","isPro":false,"fullname":"Yue Cao","user":"cy-330","type":"user"},{"_id":"64679a226192d39142245e5e","avatarUrl":"/avatars/05abee0b6317f100923936ca2099e9eb.svg","isPro":false,"fullname":"Xin Eric Wang","user":"xw-eric","type":"user"},{"_id":"68c94993d6b8520a28bf651f","avatarUrl":"/avatars/12a6d80e20f3f5e8a1b4172799f8f21c.svg","isPro":false,"fullname":"zongangxiao","user":"AngxiaoZong","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65861edfe3f7a2dcf04230f8","name":"ucsbnlp","fullname":"UC Santa Barbara NLP Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6002c1db698168af3bb9f4a5/WQYUIGXIycUiVr_J5X2n0.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.14271.md"}">
Auditing Agent Harness Safety
Authors: ,
,
,
,
,
,
,
,
,
,
Abstract
LLM agents executing within execution harnesses can produce correct outputs while violating safety constraints during execution, necessitating trajectory-level auditing to ensure proper resource access and information flow across multi-agent systems.
AI-generated summary
LLM agents increasingly run inside execution harnesses that dispatch tools, allocate resources, and route messages between specialized components. However, a harness can return a correct, benign answer over a trajectory that accesses unauthorized resources or leaks context to the wrong agent. Output-level evaluation cannot see these failures, yet most safety benchmarks score only final outputs or terminal states, even though many violations occur mid-trajectory rather than at termination. The central question is whether the harness respects user intent, permission boundaries, and information-flow constraints throughout execution. To address this gap, we propose HarnessAudit, a framework that audits full execution trajectories across boundary compliance, execution fidelity, and system stability, with a focus on multi-agent harnesses where these risks are most pronounced. We further introduce HarnessAudit-Bench, a benchmark of 210 tasks across eight real-world domains, instantiated in both single-agent and multi-agent configurations with embedded safety constraints. Evaluating ten harness configurations across frontier models and three multi-agent frameworks, we find that: (i) task completion is misaligned with safe execution, and violations accumulate with trajectory length; (ii) safety risks vary across domains, task types, and agent roles; (iii) most violations concentrate in resource access and inter-agent information transfer; and (iv) multi-agent collaboration expands the safety risk surface, while harness design sets the upper bound of safe deployment.
Community
LLM agents increasingly run inside execution harnesses that dispatch tools, allocate resources, and route messages between specialized components. However, a harness can return a correct, benign answer over a trajectory that accesses unauthorized resources or leaks context to the wrong agent. Output-level evaluation cannot see these failures, yet most safety benchmarks score only final outputs or terminal states, even though many violations occur mid-trajectory rather than at termination. The central question is whether the harness respects user intent, permission boundaries, and information-flow constraints throughout execution. To address this gap, we propose HarnessAudit, a framework that audits full execution trajectories across boundary compliance, execution fidelity, and system stability, with a focus on multi-agent harnesses where these risks are most pronounced. We further introduce HarnessAudit-Bench, a benchmark of 210 tasks across eight real-world domains, instantiated in both single-agent and multi-agent configurations with embedded safety constraints. Evaluating ten harness configurations across frontier models and three multi-agent frameworks, we find that: (i) task completion is misaligned with safe execution, and violations accumulate with trajectory length; (ii) safety risks vary across domains, task types, and agent roles; (iii) most violations concentrate in resource access and inter-agent information transfer; (iv) multi-agent collaboration expands the safety risk surface, while harness design sets the upper bound of safe deployment.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.14271 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.14271 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.14271 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.