Hugging Face Daily Papers · · 6 min read

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for training guard models from open-world threat signals and realistic agent trajectories. BraveGuard mines recent research sources to identify emerging risks and attack patterns, instantiates them as executable computer-use tasks, collects agent rollouts, and derives trajectory-level supervision for guard model training. As new threats and validation failures appear, the pipeline can be repeated, yielding an adaptive defense loop rather than a static, benchmark-driven training process. We instantiate BraveGuard by training multiple guard backbones, including Qwen3-Guard and Llama-Guard variants, and evaluate the resulting guards on trajectory-level agent-safety benchmarks. BraveGuard consistently improves safety detection across computer-use trajectories. On AgentHazard, it substantially improves detection accuracy over off-the-shelf guard models, with accuracy increasing from 38.79% to 82.38% under the averaged guard-model setting. These results show that guard supervision grounded in open-world threat discovery and realistic agent execution can improve safety monitoring beyond fixed taxonomies and synthetic prompt-level data. BraveGuard offers a scalable path toward adaptive defenses for computer-use agents facing evolving real-world risks.</p>\n","updatedAt":"2026-06-04T02:18:56.106Z","author":{"_id":"69d47558a9acd1eb26637fe9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69d47558a9acd1eb26637fe9/ZWSP5HU9uU7Bc2a1I54eq.jpeg","fullname":"YunHao-Feng","name":"Yunhao-Feng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.900614857673645},"editors":["Yunhao-Feng"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/69d47558a9acd1eb26637fe9/ZWSP5HU9uU7Bc2a1I54eq.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.01166","authors":[{"_id":"6a202b6615100c5272a841de","name":"Yunhao Feng","hidden":false},{"_id":"6a202b6615100c5272a841df","name":"Xiaohu Du","hidden":false},{"_id":"6a202b6615100c5272a841e0","name":"Xinhao Deng","hidden":false},{"_id":"6a202b6615100c5272a841e1","name":"Yifan Ding","hidden":false},{"_id":"6a202b6615100c5272a841e2","name":"Ming Wen","hidden":false},{"_id":"6a202b6615100c5272a841e3","name":"Yixu Wang","hidden":false},{"_id":"6a202b6615100c5272a841e4","name":"Yuxiang Xie","hidden":false},{"_id":"6a202b6615100c5272a841e5","name":"Baihui Zheng","hidden":false},{"_id":"6a202b6615100c5272a841e6","name":"Yingshui Tan","hidden":false},{"_id":"6a202b6615100c5272a841e7","name":"Yige Li","hidden":false},{"_id":"6a202b6615100c5272a841e8","name":"Yutao Wu","hidden":false},{"_id":"6a202b6615100c5272a841e9","name":"Kerui Cao","hidden":false},{"_id":"6a202b6615100c5272a841ea","name":"Wenke Huang","hidden":false},{"_id":"6a202b6615100c5272a841eb","name":"Yanming Guo","hidden":false},{"_id":"6a202b6615100c5272a841ec","name":"Xingjun Ma","hidden":false},{"_id":"6a202b6615100c5272a841ed","name":"Yu-Gang Jiang","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"BraveGuard: From Open-World Threats to Safer Computer-Use Agents","submittedOnDailyBy":{"_id":"69d47558a9acd1eb26637fe9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69d47558a9acd1eb26637fe9/ZWSP5HU9uU7Bc2a1I54eq.jpeg","isPro":false,"fullname":"YunHao-Feng","user":"Yunhao-Feng","type":"user","name":"Yunhao-Feng"},"summary":"Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for training guard models from open-world threat signals and realistic agent trajectories. BraveGuard mines recent research sources to identify emerging risks and attack patterns, instantiates them as executable computer-use tasks, collects agent rollouts, and derives trajectory-level supervision for guard model training. As new threats and validation failures appear, the pipeline can be repeated, yielding an adaptive defense loop rather than a static, benchmark-driven training process. We instantiate BraveGuard by training multiple guard backbones, including Qwen3-Guard and Llama-Guard variants, and evaluate the resulting guards on trajectory-level agent-safety benchmarks. BraveGuard consistently improves safety detection across computer-use trajectories. On AgentHazard, it substantially improves detection accuracy over off-the-shelf guard models, with accuracy increasing from 38.79% to 82.38% under the averaged guard-model setting. These results show that guard supervision grounded in open-world threat discovery and realistic agent execution can improve safety monitoring beyond fixed taxonomies and synthetic prompt-level data. BraveGuard offers a scalable path toward adaptive defenses for computer-use agents facing evolving real-world risks.","upvotes":4,"discussionId":"6a202b6615100c5272a841f4","githubRepo":"https://github.com/Yunhao-Feng/BraveGuard","githubRepoAddedBy":"user","ai_summary":"BraveGuard is a self-evolving defense framework that trains guard models using open-world threat signals and realistic agent trajectories to improve safety detection in computer-use agents.","ai_keywords":["guard models","computer-use agents","safety risks","agent trajectories","open-world threat signals","executable computer-use tasks","trajectory-level supervision","adaptive defense loop","guard backbones","Qwen3-Guard","Llama-Guard","AgentHazard","safety detection"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":27,"organization":{"_id":"67c1d682826160b28f778510","name":"antgroup","fullname":"Ant Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/662e1f9da266499277937d33/7VcPHdLSGlged3ixK1dys.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69d47558a9acd1eb26637fe9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/69d47558a9acd1eb26637fe9/ZWSP5HU9uU7Bc2a1I54eq.jpeg","isPro":false,"fullname":"YunHao-Feng","user":"Yunhao-Feng","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"635f7b4af72ae36c3e3223be","avatarUrl":"/avatars/afd8967e2e0dbddc0cf5692ec5673ba1.svg","isPro":false,"fullname":"Yunhao Chen","user":"dongdongunique","type":"user"},{"_id":"66935bdc5489e4f73c76bc7b","avatarUrl":"/avatars/129d1e86bbaf764b507501f4feb177db.svg","isPro":false,"fullname":"Abidoye Aanuoluwapo","user":"Aanuoluwapo65","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67c1d682826160b28f778510","name":"antgroup","fullname":"Ant Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/662e1f9da266499277937d33/7VcPHdLSGlged3ixK1dys.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.01166.md"}">
Papers
arxiv:2606.01166

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Published on Jun 2
· Submitted by
YunHao-Feng
on Jun 4
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

BraveGuard is a self-evolving defense framework that trains guard models using open-world threat signals and realistic agent trajectories to improve safety detection in computer-use agents.

Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for training guard models from open-world threat signals and realistic agent trajectories. BraveGuard mines recent research sources to identify emerging risks and attack patterns, instantiates them as executable computer-use tasks, collects agent rollouts, and derives trajectory-level supervision for guard model training. As new threats and validation failures appear, the pipeline can be repeated, yielding an adaptive defense loop rather than a static, benchmark-driven training process. We instantiate BraveGuard by training multiple guard backbones, including Qwen3-Guard and Llama-Guard variants, and evaluate the resulting guards on trajectory-level agent-safety benchmarks. BraveGuard consistently improves safety detection across computer-use trajectories. On AgentHazard, it substantially improves detection accuracy over off-the-shelf guard models, with accuracy increasing from 38.79% to 82.38% under the averaged guard-model setting. These results show that guard supervision grounded in open-world threat discovery and realistic agent execution can improve safety monitoring beyond fixed taxonomies and synthetic prompt-level data. BraveGuard offers a scalable path toward adaptive defenses for computer-use agents facing evolving real-world risks.

Community

Paper submitter about 7 hours ago

Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for training guard models from open-world threat signals and realistic agent trajectories. BraveGuard mines recent research sources to identify emerging risks and attack patterns, instantiates them as executable computer-use tasks, collects agent rollouts, and derives trajectory-level supervision for guard model training. As new threats and validation failures appear, the pipeline can be repeated, yielding an adaptive defense loop rather than a static, benchmark-driven training process. We instantiate BraveGuard by training multiple guard backbones, including Qwen3-Guard and Llama-Guard variants, and evaluate the resulting guards on trajectory-level agent-safety benchmarks. BraveGuard consistently improves safety detection across computer-use trajectories. On AgentHazard, it substantially improves detection accuracy over off-the-shelf guard models, with accuracy increasing from 38.79% to 82.38% under the averaged guard-model setting. These results show that guard supervision grounded in open-world threat discovery and realistic agent execution can improve safety monitoring beyond fixed taxonomies and synthetic prompt-level data. BraveGuard offers a scalable path toward adaptive defenses for computer-use agents facing evolving real-world risks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.01166
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.01166 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.01166 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers