Hugging Face Daily Papers · · 7 min read

PhoneWorld: Scaling Phone-Use Agent Environments

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

check out our new paper for phone-use/mobile agents gym!</p>\n","updatedAt":"2026-05-29T03:27:39.193Z","author":{"_id":"64912976b95c3f0a1e6233cb","avatarUrl":"/avatars/3e338c5eef2514055ed98ae6141a5d1a.svg","fullname":"Zhengyang Tang","name":"tangzhy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7526854276657104},"editors":["tangzhy"],"editorAvatarUrls":["/avatars/3e338c5eef2514055ed98ae6141a5d1a.svg"],"reactions":[],"isReport":false}},{"id":"6a1a40f9a233d2ba7da34013","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:44:25.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Do Phone-Use Agents Respect Your Privacy?](https://huggingface.co/papers/2604.00986) (2026)\n* [KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation](https://huggingface.co/papers/2604.08455) (2026)\n* [OpenComputer: Verifiable Software Worlds for Computer-Use Agents](https://huggingface.co/papers/2605.19769) (2026)\n* [MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research](https://huggingface.co/papers/2605.26114) (2026)\n* [SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking](https://huggingface.co/papers/2605.25160) (2026)\n* [STAMP: Training Explicit Memory for Mobile GUI Agents in Controllable and Scalable Virtual Environments](https://huggingface.co/papers/2605.29324) (2026)\n* [ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents](https://huggingface.co/papers/2604.11784) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.00986\">Do Phone-Use Agents Respect Your Privacy?</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.08455\">KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.19769\">OpenComputer: Verifiable Software Worlds for Computer-Use Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.26114\">MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.25160\">SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.29324\">STAMP: Training Explicit Memory for Mobile GUI Agents in Controllable and Scalable Virtual Environments</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.11784\">ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:44:25.086Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7178380489349365},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29486","authors":[{"_id":"6a19078e56b4bb14ec65cfa2","user":{"_id":"64912976b95c3f0a1e6233cb","avatarUrl":"/avatars/3e338c5eef2514055ed98ae6141a5d1a.svg","isPro":false,"fullname":"Zhengyang Tang","user":"tangzhy","type":"user","name":"tangzhy"},"name":"Zhengyang Tang","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:50:19.781Z","hidden":false},{"_id":"6a19078e56b4bb14ec65cfa3","name":"Yuxuan Liu","hidden":false},{"_id":"6a19078e56b4bb14ec65cfa4","name":"Xin Lai","hidden":false},{"_id":"6a19078e56b4bb14ec65cfa5","name":"Junyi Li","hidden":false},{"_id":"6a19078e56b4bb14ec65cfa6","name":"Pengyuan Lyu","hidden":false},{"_id":"6a19078e56b4bb14ec65cfa7","name":"Jason","hidden":false},{"_id":"6a19078e56b4bb14ec65cfa8","name":"Yiduo Guo","hidden":false},{"_id":"6a19078e56b4bb14ec65cfa9","name":"Zhengyao Fang","hidden":false},{"_id":"6a19078e56b4bb14ec65cfaa","name":"Yang Ding","hidden":false},{"_id":"6a19078e56b4bb14ec65cfab","name":"Yi Zhang","hidden":false},{"_id":"6a19078e56b4bb14ec65cfac","name":"Weinong Wang","hidden":false},{"_id":"6a19078e56b4bb14ec65cfad","name":"Huawen Shen","hidden":false},{"_id":"6a19078e56b4bb14ec65cfae","name":"Xingran Zhou","hidden":false},{"_id":"6a19078e56b4bb14ec65cfaf","name":"Liang Wu","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb0","name":"Fei Tang","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb1","name":"Sunqi Fan","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb2","name":"Shangpin Peng","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb3","name":"Zheng Ruan","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb4","name":"Anran Zhang","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb5","name":"Benyou Wang","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb6","name":"Rui Yan","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb7","name":"Ji-Rong Wen","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb8","name":"Chengquan Zhang","hidden":false},{"_id":"6a19078e56b4bb14ec65cfb9","name":"Han Hu","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"PhoneWorld: Scaling Phone-Use Agent Environments","submittedOnDailyBy":{"_id":"64912976b95c3f0a1e6233cb","avatarUrl":"/avatars/3e338c5eef2514055ed98ae6141a5d1a.svg","isPro":false,"fullname":"Zhengyang Tang","user":"tangzhy","type":"user","name":"tangzhy"},"summary":"A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build at scale. Existing mobile-agent benchmarks have made important progress on evaluation, but they do not by themselves provide a scalable way to construct many new phone-use environments. We present PhoneWorld, a reusable pipeline that converts real GUI trajectories and screenshots into controllable phone-use environments, executable tasks, automatic verifiers, and training rollouts. Rather than hand-building one mobile benchmark at a time, PhoneWorld uses real trajectories to recover which screens matter, how screens connect, which interactions must change environment state, and which user goals admit automatic verification. From these signals, it builds runnable mock Android apps backed by read-only app content and mutable state, then derives executable tasks, rule-based verifiers, and training rollouts from the same environments. In its current instantiation, PhoneWorld covers 34 apps across 16 domains, spanning common consumer mobile behaviors such as search, browsing, shopping, booking, media, and social interaction. Under a fixed training budget, replacing 10K steps from an auxiliary AndroidWorld corpus in an AndroidWorld-based baseline with broad PhoneWorld supervision improves all four evaluation benchmarks at once, raising HYMobileBench by 17.7 points, AndroidControl by 6.0 points, AndroidWorld by 14.7 points, and PhoneWorld by 52.5 points. We then study two additional scaling questions: increasing the amount of PhoneWorld supervision strongly improves PhoneWorld performance, and under a fixed PhoneWorld budget, expanding app coverage yields even larger gains. Overall, PhoneWorld shifts the focus from building one mobile benchmark at a time to scaling the supply of phone-use environments themselves.","upvotes":0,"discussionId":"6a19078e56b4bb14ec65cfba","ai_summary":"PhoneWorld is a pipeline that transforms real GUI trajectories and screenshots into controllable mobile environments, executable tasks, and automated verifiers, enabling scalable creation of phone-use benchmarks.","ai_keywords":["GUI trajectories","screenshots","controllable environments","mobile agent benchmarks","Android apps","executable tasks","rule-based verifiers","training rollouts","mobile behavior","AndroidWorld"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29486.md"}">
Papers
arxiv:2605.29486

PhoneWorld: Scaling Phone-Use Agent Environments

Published on May 28
· Submitted by
Zhengyang Tang
on May 29
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

PhoneWorld is a pipeline that transforms real GUI trajectories and screenshots into controllable mobile environments, executable tasks, and automated verifiers, enabling scalable creation of phone-use benchmarks.

AI-generated summary

A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build at scale. Existing mobile-agent benchmarks have made important progress on evaluation, but they do not by themselves provide a scalable way to construct many new phone-use environments. We present PhoneWorld, a reusable pipeline that converts real GUI trajectories and screenshots into controllable phone-use environments, executable tasks, automatic verifiers, and training rollouts. Rather than hand-building one mobile benchmark at a time, PhoneWorld uses real trajectories to recover which screens matter, how screens connect, which interactions must change environment state, and which user goals admit automatic verification. From these signals, it builds runnable mock Android apps backed by read-only app content and mutable state, then derives executable tasks, rule-based verifiers, and training rollouts from the same environments. In its current instantiation, PhoneWorld covers 34 apps across 16 domains, spanning common consumer mobile behaviors such as search, browsing, shopping, booking, media, and social interaction. Under a fixed training budget, replacing 10K steps from an auxiliary AndroidWorld corpus in an AndroidWorld-based baseline with broad PhoneWorld supervision improves all four evaluation benchmarks at once, raising HYMobileBench by 17.7 points, AndroidControl by 6.0 points, AndroidWorld by 14.7 points, and PhoneWorld by 52.5 points. We then study two additional scaling questions: increasing the amount of PhoneWorld supervision strongly improves PhoneWorld performance, and under a fixed PhoneWorld budget, expanding app coverage yields even larger gains. Overall, PhoneWorld shifts the focus from building one mobile benchmark at a time to scaling the supply of phone-use environments themselves.

Community

Paper author Paper submitter 1 day ago

check out our new paper for phone-use/mobile agents gym!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.29486
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.29486 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.29486 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.29486 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers