Hugging Face Daily Papers · June 16, 2026 · 4 min read

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

PhoneHarness introduces a mixed-action execution harness and benchmark for phone-use agents across GUI, CLI, and MCP-style tools. It evaluates agents with trace-backed, verifier-based side-effect checks rather than only next-screen actions.</p>\n","updatedAt":"2026-06-16T04:51:24.405Z","author":{"_id":"6421c1cdeaad1bcb28b0e903","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6421c1cdeaad1bcb28b0e903/ujqnkKBpEGwTx9ZbA5Avr.png","fullname":"Chenxin Li","name":"XGGNet","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8565989136695862},"editors":["XGGNet"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6421c1cdeaad1bcb28b0e903/ujqnkKBpEGwTx9ZbA5Avr.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.14832","authors":[{"_id":"6a30d4f7a0d4daae428601d6","user":{"_id":"6421c1cdeaad1bcb28b0e903","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6421c1cdeaad1bcb28b0e903/ujqnkKBpEGwTx9ZbA5Avr.png","isPro":false,"fullname":"Chenxin Li","user":"XGGNet","type":"user","name":"XGGNet"},"name":"Chenxin Li","status":"claimed_verified","statusLastChangedAt":"2026-06-16T12:05:49.602Z","hidden":false},{"_id":"6a30d4f7a0d4daae428601d7","name":"Zhengyao Fang","hidden":false},{"_id":"6a30d4f7a0d4daae428601d8","user":{"_id":"64912976b95c3f0a1e6233cb","avatarUrl":"/avatars/3e338c5eef2514055ed98ae6141a5d1a.svg","isPro":false,"fullname":"Zhengyang Tang","user":"tangzhy","type":"user","name":"tangzhy"},"name":"Zhengyang Tang","status":"claimed_verified","statusLastChangedAt":"2026-06-16T12:05:51.525Z","hidden":false},{"_id":"6a30d4f7a0d4daae428601d9","name":"Pengyuan Lyu","hidden":false},{"_id":"6a30d4f7a0d4daae428601da","name":"Xingran Zhou","hidden":false},{"_id":"6a30d4f7a0d4daae428601db","name":"Xin Lai","hidden":false},{"_id":"6a30d4f7a0d4daae428601dc","name":"Fei Tang","hidden":false},{"_id":"6a30d4f7a0d4daae428601dd","name":"Liang Wu","hidden":false},{"_id":"6a30d4f7a0d4daae428601de","name":"Yiduo Guo","hidden":false},{"_id":"6a30d4f7a0d4daae428601df","name":"Weinong Wang","hidden":false},{"_id":"6a30d4f7a0d4daae428601e0","name":"Junyi Li","hidden":false},{"_id":"6a30d4f7a0d4daae428601e1","name":"Yi Zhang","hidden":false},{"_id":"6a30d4f7a0d4daae428601e2","name":"Yang Ding","hidden":false},{"_id":"6a30d4f7a0d4daae428601e3","name":"Huawen Shen","hidden":false},{"_id":"6a30d4f7a0d4daae428601e4","name":"Sunqi Fan","hidden":false},{"_id":"6a30d4f7a0d4daae428601e5","name":"Shangpin Peng","hidden":false},{"_id":"6a30d4f7a0d4daae428601e6","name":"Zheng Ruan","hidden":false},{"_id":"6a30d4f7a0d4daae428601e7","name":"Anran Zhang","hidden":false},{"_id":"6a30d4f7a0d4daae428601e8","name":"Benyou Wang","hidden":false},{"_id":"6a30d4f7a0d4daae428601e9","name":"Chengquan Zhang","hidden":false},{"_id":"6a30d4f7a0d4daae428601ea","name":"Han Hu","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6421c1cdeaad1bcb28b0e903/1yxTw7xYFfkF1aXzGdASC.png","https://cdn-uploads.huggingface.co/production/uploads/6421c1cdeaad1bcb28b0e903/F8j1MrTvXzwTAKNQtXV_r.png","https://cdn-uploads.huggingface.co/production/uploads/6421c1cdeaad1bcb28b0e903/AObIjnTqxIlhq2GiwkDa0.png"],"publishedAt":"2026-06-12T00:00:00.000Z","submittedOnDailyAt":"2026-06-16T00:00:00.000Z","title":"PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions","submittedOnDailyBy":{"_id":"6421c1cdeaad1bcb28b0e903","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6421c1cdeaad1bcb28b0e903/ujqnkKBpEGwTx9ZbA5Avr.png","isPro":false,"fullname":"Chenxin Li","user":"XGGNet","type":"user","name":"XGGNet"},"summary":"Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone-use tasks are broader: they require deciding when to use app GUIs, device-side commands, or structured tools, while leaving evidence that the intended side effect actually occurred. We introduce PhoneHarness, a mixed-action benchmark and execution harness for studying phone-use agents on verifiable mobile workflows. PhoneHarness runs a device-side agent loop over GUI, CLI, and host-side tool actions, combining deterministic action routing with bounded GUI delegation and auditable execution traces. Its benchmark, PhoneHarness Bench, evaluates whether agents complete tasks with observable side effects, not only whether they produce plausible final answers. On the annotated evaluation split, PhoneHarness reaches a 75.0% pass rate, outperforming the strongest non-PhoneHarness settings by 12.9 percentage points. PhoneHarness and PhoneHarness Bench therefore play distinct but mutually dependent roles: the harness makes mixed phone workflows executable, while the benchmark measures whether agents can use that harness reliably and safely. Our findings suggest that reliable phone automation depends on action-surface routing and verifiable execution, not only visual GUI control.","upvotes":8,"discussionId":"6a30d4f7a0d4daae428601eb","projectPage":"https://phoneharness.github.io/","githubRepo":"https://github.com/PhoneHarness/PhoneHarness","githubRepoAddedBy":"user","ai_summary":"PhoneHarness presents a mixed-action benchmark and execution framework for evaluating phone-use agents on verifiable mobile workflows, demonstrating superior performance over existing approaches through deterministic action routing and auditable execution traces.","ai_keywords":["phone-use agents","mobile workflows","GUI controller","action-surface routing","verifiable execution","mixed-action benchmark","execution harness","deterministic action routing","bounded GUI delegation","auditable execution traces"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":20},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6421c1cdeaad1bcb28b0e903","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6421c1cdeaad1bcb28b0e903/ujqnkKBpEGwTx9ZbA5Avr.png","isPro":false,"fullname":"Chenxin Li","user":"XGGNet","type":"user"},{"_id":"66bc4ee20efef70ff3533b8b","avatarUrl":"/avatars/bfe429f05ec6c83256449d130d262db0.svg","isPro":false,"fullname":"none","user":"marloweee","type":"user"},{"_id":"67b327cdd4665a0448eef7d5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67b327cdd4665a0448eef7d5/_B5Z9MCa_qiFrDj1axKlz.png","isPro":true,"fullname":"Xinyuan Wang","user":"xywang626","type":"user"},{"_id":"64e3282156b920ef00cf7d94","avatarUrl":"/avatars/8a76c27fe6c71eb98a28b4ebfb90336d.svg","isPro":false,"fullname":"Guo","user":"YiDuo1999","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"64912976b95c3f0a1e6233cb","avatarUrl":"/avatars/3e338c5eef2514055ed98ae6141a5d1a.svg","isPro":false,"fullname":"Zhengyang Tang","user":"tangzhy","type":"user"},{"_id":"642bddc1fc41757877f68327","avatarUrl":"/avatars/f275237f36a112624d59a7e3f73237d3.svg","isPro":false,"fullname":"Xin Lai","user":"xinlai","type":"user"},{"_id":"642969134e073875f6a6579f","avatarUrl":"/avatars/ad9a4234aa1caa0e10c60e3aa094102d.svg","isPro":false,"fullname":"Wang","user":"kiyoxi2022","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.14832.md","query":{}}">

Papers

arxiv:2606.14832

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Published on Jun 12

· Submitted by

Chenxin Li on Jun 16

Upvote

Authors:

Chenxin Li ,

Zhengyang Tang ,

Abstract

PhoneHarness presents a mixed-action benchmark and execution framework for evaluating phone-use agents on verifiable mobile workflows, demonstrating superior performance over existing approaches through deterministic action routing and auditable execution traces.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone-use tasks are broader: they require deciding when to use app GUIs, device-side commands, or structured tools, while leaving evidence that the intended side effect actually occurred. We introduce PhoneHarness, a mixed-action benchmark and execution harness for studying phone-use agents on verifiable mobile workflows. PhoneHarness runs a device-side agent loop over GUI, CLI, and host-side tool actions, combining deterministic action routing with bounded GUI delegation and auditable execution traces. Its benchmark, PhoneHarness Bench, evaluates whether agents complete tasks with observable side effects, not only whether they produce plausible final answers. On the annotated evaluation split, PhoneHarness reaches a 75.0% pass rate, outperforming the strongest non-PhoneHarness settings by 12.9 percentage points. PhoneHarness and PhoneHarness Bench therefore play distinct but mutually dependent roles: the harness makes mixed phone workflows executable, while the benchmark measures whether agents can use that harness reliably and safely. Our findings suggest that reliable phone automation depends on action-surface routing and verifiable execution, not only visual GUI control.

View arXiv page View PDF Project page GitHub 20 Add to collection

Community

XGGNet

Paper author Paper submitter about 8 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.14832

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.14832 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.14832 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.14832 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers