Code, data, and trained models will be released at <a href=\"https://memgui-agent.github.io\" rel=\"nofollow\">https://memgui-agent.github.io</a>.</p>\n","updatedAt":"2026-06-24T02:10:56.233Z","author":{"_id":"64d761b98ebc40443831f82a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d761b98ebc40443831f82a/DHBOtOstiFp2-lDY6b9gb.png","fullname":"Guangyi Liu","name":"lgy0404","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8865987658500671},"editors":["lgy0404"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64d761b98ebc40443831f82a/DHBOtOstiFp2-lDY6b9gb.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.19926","authors":[{"_id":"6a349ba04c5c5e0d69bf1b91","name":"Guangyi Liu","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b92","name":"Gao Wu","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b93","name":"Congxiao Liu","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b94","name":"Pengxiang Zhao","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b95","name":"Liang Liu","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b96","name":"Mading Li","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b97","name":"Qi Zhang","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b98","name":"Mengyan Wang","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b99","name":"Liang Guo","hidden":false},{"_id":"6a349ba04c5c5e0d69bf1b9a","name":"Yong Liu","hidden":false}],"publishedAt":"2026-06-18T00:00:00.000Z","submittedOnDailyAt":"2026-06-24T00:00:00.000Z","title":"MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management","submittedOnDailyBy":{"_id":"64d761b98ebc40443831f82a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d761b98ebc40443831f82a/DHBOtOstiFp2-lDY6b9gb.png","isPro":false,"fullname":"Guangyi Liu","user":"lgy0404","type":"user","name":"lgy0404"},"summary":"MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context management. MemGUI-Agent is built on Context-as-Action (ConAct), which casts context management as first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. To make proactive context management learnable across model scales, we construct MemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K produces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.","upvotes":19,"discussionId":"6a349ba14c5c5e0d69bf1b9b","projectPage":"https://memgui-agent.github.io/","githubRepo":"https://github.com/kwai/MemGUI-Agent","githubRepoAddedBy":"user","ai_summary":"MemGUI-Agent addresses long-horizon mobile GUI task limitations through proactive context management using Context-as-Action (ConAct) to maintain critical information across extended sequences.","ai_keywords":["MLLM-based mobile GUI agents","ReAct-style prompting","context management","Context-as-Action (ConAct)","structured context fields","folded action history","folded UI state","recent step record","end-to-end long-horizon mobile GUI agent","MemGUI-3K","supervised training","offline analysis","MemGUI-Bench","MobileWorld benchmark"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":1,"organization":{"_id":"69bcbf46685c38830c5f8892","name":"kwaiAI","fullname":"kwai","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6882dccd3dbdaf621b683333/jmnA7jSbcQby728JAArIj.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64d761b98ebc40443831f82a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d761b98ebc40443831f82a/DHBOtOstiFp2-lDY6b9gb.png","isPro":false,"fullname":"Guangyi Liu","user":"lgy0404","type":"user"},{"_id":"677cd488b35098e1340c940e","avatarUrl":"/avatars/f3b41ecc994ecc1f08ea1ba7e6467ab4.svg","isPro":false,"fullname":"Wu Gao","user":"Wugao02","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"694b8c49d7e02d8a1c1d8ebb","avatarUrl":"/avatars/4b6008ccf0562a8f1cf85f0d62ec2650.svg","isPro":false,"fullname":"jack","user":"113tom","type":"user"},{"_id":"643429be546e16f17a133929","avatarUrl":"/avatars/fe82c49367ac05d5decab5ffcda62441.svg","isPro":false,"fullname":"Wooo Taylor","user":"Wooo0","type":"user"},{"_id":"6458ce236fa580137af5aa95","avatarUrl":"/avatars/db65a7332e375eb5daad5c1b076b1e3b.svg","isPro":false,"fullname":"Yuxiang Chai","user":"Yuxiang007","type":"user"},{"_id":"666aa99cd1652853e4f9a8b9","avatarUrl":"/avatars/7cd5a0c34b5ccb8eff5a353d88d15a93.svg","isPro":false,"fullname":"HanXiao","user":"HanXiao1999","type":"user"},{"_id":"6779c21c76d1c8d9cf03fbab","avatarUrl":"/avatars/6efab949d19515926015f191f31392c1.svg","isPro":false,"fullname":"XiangChen","user":"Soever","type":"user"},{"_id":"676127cf11b19ea602bb202a","avatarUrl":"/avatars/dfd802a24bd63e509728159ebb1769f6.svg","isPro":false,"fullname":"Zhengxi Lu","user":"LZXzju","type":"user"},{"_id":"66e01f65f147db9777c74aa7","avatarUrl":"/avatars/c2cc265a27f88bccdcfd43ce9909529d.svg","isPro":false,"fullname":"Zhixin Lin","user":"Zhixin-L","type":"user"},{"_id":"663e1cc209862e819b9e694c","avatarUrl":"/avatars/005a2ed070f0c65223a17c88b18f8e93.svg","isPro":false,"fullname":"Yaozhen Liang","user":"asot2887","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646def60df618b303b419323/JLJGYen4-5M8ivsLsSk0w.jpeg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69bcbf46685c38830c5f8892","name":"kwaiAI","fullname":"kwai","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6882dccd3dbdaf621b683333/jmnA7jSbcQby728JAArIj.png"},"query":{}}">
MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management
Authors: ,
,
,
,
,
,
,
,
,
Abstract
MemGUI-Agent addresses long-horizon mobile GUI task limitations through proactive context management using Context-as-Action (ConAct) to maintain critical information across extended sequences.
MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context management. MemGUI-Agent is built on Context-as-Action (ConAct), which casts context management as first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. To make proactive context management learnable across model scales, we construct MemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K produces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.19926 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.