Hugging Face Daily Papers · · 3 min read

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Code, data, and trained models will be released at <a href=\"https://memgui-agent.github.io\" rel=\"nofollow\">https://memgui-agent.github.io</a>.</p>\n","updatedAt":"2026-06-24T02:10:56.233Z","author":{"_id":"64d761b98ebc40443831f82a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d761b98ebc40443831f82a/DHBOtOstiFp2-lDY6b9gb.png","fullname":"Guangyi Liu","name":"lgy0404","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8865987658500671},"editors":["lgy0404"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64d761b98ebc40443831f82a/DHBOtOstiFp2-lDY6b9gb.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.19926","authors":[{"_id":"6a349ba04c5c5e0d69bf1b91","name":"Guangyi Liu","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b92","name":"Gao Wu","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b93","name":"Congxiao Liu","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b94","name":"Pengxiang Zhao","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b95","name":"Liang Liu","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b96","name":"Mading Li","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b97","name":"Qi Zhang","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b98","name":"Mengyan Wang","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b99","name":"Liang Guo","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b9a","name":"Yong Liu","hidden":false}],"publishedAt":"2026-06-18T00:00:00.000Z","submittedOnDailyAt":"2026-06-24T00:00:00.000Z","title":"MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management","submittedOnDailyBy":{"_id":"64d761b98ebc40443831f82a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d761b98ebc40443831f82a/DHBOtOstiFp2-lDY6b9gb.png","isPro":false,"fullname":"Guangyi Liu","user":"lgy0404","type":"user","name":"lgy0404"},"summary":"MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context management. MemGUI-Agent is built on Context-as-Action (ConAct), which casts context management as first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. To make proactive context management learnable across model scales, we construct MemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K produces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.","upvotes":19,"discussionId":"6a349ba14c5c5e0d69bf1b9b","projectPage":"https://memgui-agent.github.io/","githubRepo":"https://github.com/kwai/MemGUI-Agent","githubRepoAddedBy":"user","ai_summary":"MemGUI-Agent addresses long-horizon mobile GUI task limitations through proactive context management using Context-as-Action (ConAct) to maintain critical information across extended sequences.","ai_keywords":["MLLM-based mobile GUI agents","ReAct-style prompting","context management","Context-as-Action (ConAct)","structured context fields","folded action history","folded UI state","recent step record","end-to-end long-horizon mobile GUI agent","MemGUI-3K","supervised training","offline analysis","MemGUI-Bench","MobileWorld benchmark"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":1,"organization":{"_id":"69bcbf46685c38830c5f8892","name":"kwaiAI","fullname":"kwai","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6882dccd3dbdaf621b683333/jmnA7jSbcQby728JAArIj.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64d761b98ebc40443831f82a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d761b98ebc40443831f82a/DHBOtOstiFp2-lDY6b9gb.png","isPro":false,"fullname":"Guangyi Liu","user":"lgy0404","type":"user"},{"_id":"677cd488b35098e1340c940e","avatarUrl":"/avatars/f3b41ecc994ecc1f08ea1ba7e6467ab4.svg","isPro":false,"fullname":"Wu Gao","user":"Wugao02","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"694b8c49d7e02d8a1c1d8ebb","avatarUrl":"/avatars/4b6008ccf0562a8f1cf85f0d62ec2650.svg","isPro":false,"fullname":"jack","user":"113tom","type":"user"},{"_id":"643429be546e16f17a133929","avatarUrl":"/avatars/fe82c49367ac05d5decab5ffcda62441.svg","isPro":false,"fullname":"Wooo Taylor","user":"Wooo0","type":"user"},{"_id":"6458ce236fa580137af5aa95","avatarUrl":"/avatars/db65a7332e375eb5daad5c1b076b1e3b.svg","isPro":false,"fullname":"Yuxiang Chai","user":"Yuxiang007","type":"user"},{"_id":"666aa99cd1652853e4f9a8b9","avatarUrl":"/avatars/7cd5a0c34b5ccb8eff5a353d88d15a93.svg","isPro":false,"fullname":"HanXiao","user":"HanXiao1999","type":"user"},{"_id":"6779c21c76d1c8d9cf03fbab","avatarUrl":"/avatars/6efab949d19515926015f191f31392c1.svg","isPro":false,"fullname":"XiangChen","user":"Soever","type":"user"},{"_id":"676127cf11b19ea602bb202a","avatarUrl":"/avatars/dfd802a24bd63e509728159ebb1769f6.svg","isPro":false,"fullname":"Zhengxi Lu","user":"LZXzju","type":"user"},{"_id":"66e01f65f147db9777c74aa7","avatarUrl":"/avatars/c2cc265a27f88bccdcfd43ce9909529d.svg","isPro":false,"fullname":"Zhixin Lin","user":"Zhixin-L","type":"user"},{"_id":"663e1cc209862e819b9e694c","avatarUrl":"/avatars/005a2ed070f0c65223a17c88b18f8e93.svg","isPro":false,"fullname":"Yaozhen Liang","user":"asot2887","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646def60df618b303b419323/JLJGYen4-5M8ivsLsSk0w.jpeg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69bcbf46685c38830c5f8892","name":"kwaiAI","fullname":"kwai","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6882dccd3dbdaf621b683333/jmnA7jSbcQby728JAArIj.png"},"query":{}}">
Papers
arxiv:2606.19926

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Published on Jun 18
· Submitted by
Guangyi Liu
on Jun 24
Authors:
,
,
,
,
,
,
,
,
,

Abstract

MemGUI-Agent addresses long-horizon mobile GUI task limitations through proactive context management using Context-as-Action (ConAct) to maintain critical information across extended sequences.

MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context management. MemGUI-Agent is built on Context-as-Action (ConAct), which casts context management as first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. To make proactive context management learnable across model scales, we construct MemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K produces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.

Community

Paper submitter about 5 hours ago

Code, data, and trained models will be released at https://memgui-agent.github.io.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.19926 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers