Hugging Face Daily Papers · June 23, 2026 · 4 min read

Training Open Models for Agentic Phone Use

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

This is a really interesting approach to the phone agent problem. Using a mix of real and mock environments to bridge that gap between simulation speed and real-world reliability makes a lot of sense, especially since resetting real apps is such a headache.\nI'm curious if you have any thoughts on why cross-app workflows are still lagging behind. Do you think the bottleneck is more about the model's long-term memory or the complexity of moving between distinct app interfaces?\nI made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go: <a href=\"https://researchpod.app/episode/dd8dcb05-a1cd-43ec-a37f-f2a03b2509ac\" rel=\"nofollow\">https://researchpod.app/episode/dd8dcb05-a1cd-43ec-a37f-f2a03b2509ac</a>\n","updatedAt":"2026-06-23T11:22:44.416Z","author":{"_id":"6960eca92f7ad9b043b5cbe0","avatarUrl":"/avatars/e68dcc7fd04f143d849d40414866e633.svg","fullname":"Noah","name":"noahml","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":0,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9442108273506165},"editors":["noahml"],"editorAvatarUrls":["/avatars/e68dcc7fd04f143d849d40414866e633.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.23049","authors":[{"_id":"6a3a0941fdcd3514343bb5ff","user":{"_id":"64912976b95c3f0a1e6233cb","avatarUrl":"/avatars/3e338c5eef2514055ed98ae6141a5d1a.svg","isPro":false,"fullname":"Zhengyang Tang","user":"tangzhy","type":"user","name":"tangzhy"},"name":"Zhengyang Tang","status":"claimed_verified","statusLastChangedAt":"2026-06-23T13:56:18.449Z","hidden":false},{"_id":"6a3a0941fdcd3514343bb600","name":"Xin Lai","hidden":false},{"_id":"6a3a0941fdcd3514343bb601","name":"Pengyuan Lyu","hidden":false},{"_id":"6a3a0941fdcd3514343bb602","name":"Xinyuan Wang","hidden":false},{"_id":"6a3a0941fdcd3514343bb603","name":"Tianyi Bai","hidden":false},{"_id":"6a3a0941fdcd3514343bb604","name":"Chenxin Li","hidden":false},{"_id":"6a3a0941fdcd3514343bb605","name":"Yiduo Guo","hidden":false},{"_id":"6a3a0941fdcd3514343bb606","name":"Huawen Shen","hidden":false},{"_id":"6a3a0941fdcd3514343bb607","name":"Yuxuan Liu","hidden":false},{"_id":"6a3a0941fdcd3514343bb608","user":{"_id":"63aaf2a2a4bdd629b7eb2b5b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63aaf2a2a4bdd629b7eb2b5b/WOa3nAUNy5D3MsFUV9B8Z.jpeg","isPro":false,"fullname":"Junyi Li","user":"ProvenceStar","type":"user","name":"ProvenceStar"},"name":"Junyi Li","status":"claimed_verified","statusLastChangedAt":"2026-06-23T13:56:16.575Z","hidden":false},{"_id":"6a3a0941fdcd3514343bb609","name":"Zhengyao Fang","hidden":false},{"_id":"6a3a0941fdcd3514343bb60a","name":"Yang Ding","hidden":false},{"_id":"6a3a0941fdcd3514343bb60b","name":"Yi Zhang","hidden":false},{"_id":"6a3a0941fdcd3514343bb60c","name":"Weinong Wang","hidden":false},{"_id":"6a3a0941fdcd3514343bb60d","name":"Xingran Zhou","hidden":false},{"_id":"6a3a0941fdcd3514343bb60e","name":"Liang Wu","hidden":false},{"_id":"6a3a0941fdcd3514343bb60f","name":"Fei Tang","hidden":false},{"_id":"6a3a0941fdcd3514343bb610","name":"Sunqi Fan","hidden":false},{"_id":"6a3a0941fdcd3514343bb611","name":"Shangpin Peng","hidden":false},{"_id":"6a3a0941fdcd3514343bb612","name":"Zheng Ruan","hidden":false},{"_id":"6a3a0941fdcd3514343bb613","name":"Anran Zhang","hidden":false},{"_id":"6a3a0941fdcd3514343bb614","name":"Benyou Wang","hidden":false},{"_id":"6a3a0941fdcd3514343bb615","name":"Ji-Rong Wen","hidden":false},{"_id":"6a3a0941fdcd3514343bb616","name":"Rui Yan","hidden":false},{"_id":"6a3a0941fdcd3514343bb617","name":"Chengquan Zhang","hidden":false},{"_id":"6a3a0941fdcd3514343bb618","name":"Han Hu","hidden":false}],"publishedAt":"2026-06-22T00:00:00.000Z","submittedOnDailyAt":"2026-06-23T00:00:00.000Z","title":"Training Open Models for Agentic Phone Use","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"Phones are becoming an important execution surface for general-purpose agents, but training open models for reliable phone use remains difficult because the environment that matters at deployment, real devices running real apps, is slow, stateful, side-effectful, and hard to reset or verify, while scalable mock environments only approximate real behavior. We present PhoneBuddy, a training recipe and open-model line for agentic phone use that combines a real-app environment with a mock-app environment, PhoneWorld, which reconstructs runnable mock apps from real GUI usage structure. PhoneBuddy first builds a shared supervised fine-tuning stage from trajectories collected in both environments, then compares real-app RL against mixed RL across both environments. Across a 150-task human evaluation on real phones spanning apps, mini-apps, and cross-app workflows, task success rate improves from 36.67\\% after supervised fine-tuning to 40.67\\% after real-app RL and 45.33\\% after mixed RL. On AndroidWorld, the same progression rises from 60.3\\% to 77.2\\% to 83.2\\%. These results show that mock-app training is not a replacement for real-app RL, but a complementary source of scalable, resettable, and automatically checked interaction. The gains are strongest on app and mini-app tasks, while long-horizontal cross-app workflows remain an important open challenge.","upvotes":9,"discussionId":"6a3a0942fdcd3514343bb619","projectPage":"https://phonebuddyai.github.io/","githubRepo":"https://github.com/PhoneBuddyAI/phonebuddy","githubRepoAddedBy":"user","ai_summary":"PhoneBuddy combines real and mock app environments to improve training of open models for phone use, demonstrating enhanced task success rates through mixed reinforcement learning approaches.","ai_keywords":["open models","real-app environment","mock-app environment","PhoneWorld","supervised fine-tuning","reinforcement learning","task success rate","AndroidWorld"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":5,"organization":{"_id":"6645f953c39288df638dbdd5","name":"Tencent-Hunyuan","fullname":"Tencent Hunyuan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62d22496c58f969c152bcefd/woKSjt2wXvBNKussyYPsa.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"64912976b95c3f0a1e6233cb","avatarUrl":"/avatars/3e338c5eef2514055ed98ae6141a5d1a.svg","isPro":false,"fullname":"Zhengyang Tang","user":"tangzhy","type":"user"},{"_id":"63aaf2a2a4bdd629b7eb2b5b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63aaf2a2a4bdd629b7eb2b5b/WOa3nAUNy5D3MsFUV9B8Z.jpeg","isPro":false,"fullname":"Junyi Li","user":"ProvenceStar","type":"user"},{"_id":"642bddc1fc41757877f68327","avatarUrl":"/avatars/f275237f36a112624d59a7e3f73237d3.svg","isPro":false,"fullname":"Xin Lai","user":"xinlai","type":"user"},{"_id":"697c8b15a7f796854ef333c4","avatarUrl":"/avatars/94de3a736fac914944f1b57609e3819a.svg","isPro":false,"fullname":"Joel Wang","user":"joelhenwang","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"65f3d7ebc2d214f88485bc7d","avatarUrl":"/avatars/d5724567e69e39ec557045a2da237bdd.svg","isPro":false,"fullname":"RagMaster","user":"ragmaster1","type":"user"},{"_id":"64706424d9360cd9d8e5b0dc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64706424d9360cd9d8e5b0dc/NcGoYhmm20uZXxP73yx-P.jpeg","isPro":false,"fullname":"Alex","user":"M0nteCarl0","type":"user"},{"_id":"67769df7f45aa32b2edfc87f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67769df7f45aa32b2edfc87f/2_SnR2JHYQFlekOMnkYTM.png","isPro":false,"fullname":"Junayed ahmed","user":"tamim-korex","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6645f953c39288df638dbdd5","name":"Tencent-Hunyuan","fullname":"Tencent Hunyuan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62d22496c58f969c152bcefd/woKSjt2wXvBNKussyYPsa.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.23049.md","query":{}}">

Papers

arxiv:2606.23049

Training Open Models for Agentic Phone Use

Published on Jun 22

· Submitted by

taesiri on Jun 23

Tencent Hunyuan

Upvote

Authors:

Zhengyang Tang ,

Junyi Li ,

Abstract

PhoneBuddy combines real and mock app environments to improve training of open models for phone use, demonstrating enhanced task success rates through mixed reinforcement learning approaches.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Phones are becoming an important execution surface for general-purpose agents, but training open models for reliable phone use remains difficult because the environment that matters at deployment, real devices running real apps, is slow, stateful, side-effectful, and hard to reset or verify, while scalable mock environments only approximate real behavior. We present PhoneBuddy, a training recipe and open-model line for agentic phone use that combines a real-app environment with a mock-app environment, PhoneWorld, which reconstructs runnable mock apps from real GUI usage structure. PhoneBuddy first builds a shared supervised fine-tuning stage from trajectories collected in both environments, then compares real-app RL against mixed RL across both environments. Across a 150-task human evaluation on real phones spanning apps, mini-apps, and cross-app workflows, task success rate improves from 36.67\% after supervised fine-tuning to 40.67\% after real-app RL and 45.33\% after mixed RL. On AndroidWorld, the same progression rises from 60.3\% to 77.2\% to 83.2\%. These results show that mock-app training is not a replacement for real-app RL, but a complementary source of scalable, resettable, and automatically checked interaction. The gains are strongest on app and mini-app tasks, while long-horizontal cross-app workflows remain an important open challenge.

View arXiv page View PDF Project page GitHub 5 Add to collection

Community

noahml

about 14 hours ago

I'm curious if you have any thoughts on why cross-app workflows are still lagging behind. Do you think the bottleneck is more about the model's long-term memory or the complexity of moving between distinct app interfaces?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/dd8dcb05-a1cd-43ec-a37f-f2a03b2509ac

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.23049

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.23049 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.23049 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.23049 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

Training Open Models for Agentic Phone Use

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers