Hugging Face Daily Papers · May 18, 2026 · 8 min read

Auditing Agent Harness Safety

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

LLM agents increasingly run inside execution harnesses that dispatch tools, allocate resources, and route messages between specialized components. However, a harness can return a correct, benign answer over a trajectory that accesses unauthorized resources or leaks context to the wrong agent. Output-level evaluation cannot see these failures, yet most safety benchmarks score only final outputs or terminal states, even though many violations occur mid-trajectory rather than at termination. The central question is whether the harness respects user intent, permission boundaries, and information-flow constraints throughout execution. To address this gap, we propose HarnessAudit, a framework that audits full execution trajectories across boundary compliance, execution fidelity, and system stability, with a focus on multi-agent harnesses where these risks are most pronounced. We further introduce HarnessAudit-Bench, a benchmark of 210 tasks across eight real-world domains, instantiated in both single-agent and multi-agent configurations with embedded safety constraints. Evaluating ten harness configurations across frontier models and three multi-agent frameworks, we find that: (i) task completion is misaligned with safe execution, and violations accumulate with trajectory length; (ii) safety risks vary across domains, task types, and agent roles; (iii) most violations concentrate in resource access and inter-agent information transfer; (iv) multi-agent collaboration expands the safety risk surface, while harness design sets the upper bound of safe deployment.\n","updatedAt":"2026-05-18T23:49:16.854Z","author":{"_id":"65e71f6bcd3df9b0f6b2678b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65e71f6bcd3df9b0f6b2678b/8mdex6eI80TGezot_GGI3.jpeg","fullname":"Chengzhi Liu","name":"LCZZZZ","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8980494141578674},"editors":["LCZZZZ"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65e71f6bcd3df9b0f6b2678b/8mdex6eI80TGezot_GGI3.jpeg"],"reactions":[],"isReport":false}},{"id":"6a0bc11c6ad1d59d44c2ce4c","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false},"createdAt":"2026-05-19T01:47:08.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents](https://huggingface.co/papers/2604.10577) (2026)\n* [Toward a Principled Framework for Agent Safety Measurement](https://huggingface.co/papers/2605.01644) (2026)\n* [AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems](https://huggingface.co/papers/2605.08715) (2026)\n* [Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values](https://huggingface.co/papers/2605.10365) (2026)\n* [Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems](https://huggingface.co/papers/2605.10481) (2026)\n* [HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark](https://huggingface.co/papers/2604.13954) (2026)\n* [AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents](https://huggingface.co/papers/2605.13357) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.10577\">The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.01644\">Toward a Principled Framework for Agent Safety Measurement</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.08715\">AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.10365\">Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.10481\">Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.13954\">HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.13357\">AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><a href=\"/librarian-bot\">@librarian-bot</a> recommend</code>\n","updatedAt":"2026-05-19T01:47:08.780Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7020947337150574},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14271","authors":[{"_id":"6a0ba5618ca2d0b256380288","name":"Chengzhi Liu","hidden":false},{"_id":"6a0ba5618ca2d0b256380289","name":"Yichen Guo","hidden":false},{"_id":"6a0ba5618ca2d0b25638028a","name":"Yepeng Liu","hidden":false},{"_id":"6a0ba5618ca2d0b25638028b","name":"Yuzhe Yang","hidden":false},{"_id":"6a0ba5618ca2d0b25638028c","name":"Qianqi Yan","hidden":false},{"_id":"6a0ba5618ca2d0b25638028d","name":"Xuandong Zhao","hidden":false},{"_id":"6a0ba5618ca2d0b25638028e","name":"Wenyue Hua","hidden":false},{"_id":"6a0ba5618ca2d0b25638028f","name":"Sheng Liu","hidden":false},{"_id":"6a0ba5618ca2d0b256380290","name":"Sharon Li","hidden":false},{"_id":"6a0ba5618ca2d0b256380291","name":"Yuheng Bu","hidden":false},{"_id":"6a0ba5618ca2d0b256380292","name":"Xin Eric Wang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/65e71f6bcd3df9b0f6b2678b/Kt_Tk50CWg7TdrO1kr1VG.png","https://cdn-uploads.huggingface.co/production/uploads/65e71f6bcd3df9b0f6b2678b/5l8HJUcV1dc-V2PddXG_U.mp4"],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"Auditing Agent Harness Safety","submittedOnDailyBy":{"_id":"65e71f6bcd3df9b0f6b2678b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65e71f6bcd3df9b0f6b2678b/8mdex6eI80TGezot_GGI3.jpeg","isPro":true,"fullname":"Chengzhi Liu","user":"LCZZZZ","type":"user","name":"LCZZZZ"},"summary":"LLM agents increasingly run inside execution harnesses that dispatch tools, allocate resources, and route messages between specialized components. However, a harness can return a correct, benign answer over a trajectory that accesses unauthorized resources or leaks context to the wrong agent. Output-level evaluation cannot see these failures, yet most safety benchmarks score only final outputs or terminal states, even though many violations occur mid-trajectory rather than at termination. The central question is whether the harness respects user intent, permission boundaries, and information-flow constraints throughout execution. To address this gap, we propose HarnessAudit, a framework that audits full execution trajectories across boundary compliance, execution fidelity, and system stability, with a focus on multi-agent harnesses where these risks are most pronounced. We further introduce HarnessAudit-Bench, a benchmark of 210 tasks across eight real-world domains, instantiated in both single-agent and multi-agent configurations with embedded safety constraints. Evaluating ten harness configurations across frontier models and three multi-agent frameworks, we find that: (i) task completion is misaligned with safe execution, and violations accumulate with trajectory length; (ii) safety risks vary across domains, task types, and agent roles; (iii) most violations concentrate in resource access and inter-agent information transfer; and (iv) multi-agent collaboration expands the safety risk surface, while harness design sets the upper bound of safe deployment.","upvotes":7,"discussionId":"6a0ba5618ca2d0b256380293","projectPage":"https://harnessaudit.github.io/","githubRepo":"https://github.com/eric-ai-lab/HarnessAudit","githubRepoAddedBy":"user","ai_summary":"LLM agents executing within execution harnesses can produce correct outputs while violating safety constraints during execution, necessitating trajectory-level auditing to ensure proper resource access and information flow across multi-agent systems.","ai_keywords":["execution harnesses","tool dispatching","resource allocation","multi-agent systems","safety benchmarks","trajectory auditing","boundary compliance","information-flow constraints","HarnessAudit","HarnessAudit-Bench"],"githubStars":2,"organization":{"_id":"65861edfe3f7a2dcf04230f8","name":"ucsbnlp","fullname":"UC Santa Barbara NLP Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6002c1db698168af3bb9f4a5/WQYUIGXIycUiVr_J5X2n0.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65e71f6bcd3df9b0f6b2678b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65e71f6bcd3df9b0f6b2678b/8mdex6eI80TGezot_GGI3.jpeg","isPro":true,"fullname":"Chengzhi Liu","user":"LCZZZZ","type":"user"},{"_id":"6a0ba64d9f6e27c185835ed0","avatarUrl":"/avatars/e793b8c62a023d7e56e1501c63b50e02.svg","isPro":false,"fullname":"Lczzzzz","user":"Lczzzzz","type":"user"},{"_id":"63f3fc83520c1461892d323e","avatarUrl":"/avatars/bcfe9d170c249492a5e0badaa9ac2325.svg","isPro":false,"fullname":"Yepeng Liu","user":"yepengliu","type":"user"},{"_id":"6a022f0ad8ecb9f4398d7930","avatarUrl":"/avatars/1b6f19891667938c199b9b6f7532b27d.svg","isPro":false,"fullname":"YURAN SUN","user":"LinaSun","type":"user"},{"_id":"68551c9e7b184676f6cb7bca","avatarUrl":"/avatars/596b4f3d29cc89b9c374321d12a24c38.svg","isPro":false,"fullname":"Yue Cao","user":"cy-330","type":"user"},{"_id":"64679a226192d39142245e5e","avatarUrl":"/avatars/05abee0b6317f100923936ca2099e9eb.svg","isPro":false,"fullname":"Xin Eric Wang","user":"xw-eric","type":"user"},{"_id":"68c94993d6b8520a28bf651f","avatarUrl":"/avatars/12a6d80e20f3f5e8a1b4172799f8f21c.svg","isPro":false,"fullname":"zongangxiao","user":"AngxiaoZong","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65861edfe3f7a2dcf04230f8","name":"ucsbnlp","fullname":"UC Santa Barbara NLP Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6002c1db698168af3bb9f4a5/WQYUIGXIycUiVr_J5X2n0.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.14271.md"}">

Papers

arxiv:2605.14271

Auditing Agent Harness Safety

Published on May 14

· Submitted by

Chengzhi Liu on May 18

UC Santa Barbara NLP Group

Upvote

Authors:

Abstract

LLM agents executing within execution harnesses can produce correct outputs while violating safety constraints during execution, necessitating trajectory-level auditing to ensure proper resource access and information flow across multi-agent systems.

AI-generated summary

View arXiv page View PDF Project page GitHub 2 Add to collection

Community

LCZZZZ

Paper submitter about 2 hours ago

librarian-bot

14 minutes ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.14271

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.14271 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.14271 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.14271 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Auditing Agent Harness Safety

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers