Hugging Face Daily Papers · · 5 min read

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

OpenWebRL presents a fully open framework for training visual web agents with online multi-turn reinforcement learning on real websites. It covers the full pipeline from live-browser infrastructure and supervised initialization to context management, trajectory-level judging, and policy optimization. The resulting OpenWebRL-4B achieves strong open-source performance on challenging live-web benchmarks, offering a practical and reproducible path toward more capable open web agents. Code (<a href=\"https://github.com/OpenWebRL/OpenWebRL\" rel=\"nofollow\">https://github.com/OpenWebRL/OpenWebRL</a>) and data (<a href=\"https://huggingface.co/OpenWebRL\">https://huggingface.co/OpenWebRL</a>) are all open-sourced.<br><a href=\"https://cdn-uploads.huggingface.co/production/uploads/64d45451c34a346181b130dd/4Evmk_wUuF9-tnmEzP57U.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/64d45451c34a346181b130dd/4Evmk_wUuF9-tnmEzP57U.png\" alt=\"image\"></a></p>\n","updatedAt":"2026-06-02T06:03:32.651Z","author":{"_id":"64d45451c34a346181b130dd","avatarUrl":"/avatars/9bb8205b889337df5d321539c9b5d69d.svg","fullname":"Rui Yang","name":"Ray2333","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":16,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.7469683885574341},"editors":["Ray2333"],"editorAvatarUrls":["/avatars/9bb8205b889337df5d321539c9b5d69d.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.02031","authors":[{"_id":"6a1e63af808ddbc3c7d43e2d","name":"Rui Yang","hidden":false},{"_id":"6a1e63af808ddbc3c7d43e2e","name":"Qianhui Wu","hidden":false},{"_id":"6a1e63af808ddbc3c7d43e2f","name":"Yuxi Chen","hidden":false},{"_id":"6a1e63af808ddbc3c7d43e30","name":"Hao Bai","hidden":false},{"_id":"6a1e63af808ddbc3c7d43e31","name":"Wenlin Yao","hidden":false},{"_id":"6a1e63af808ddbc3c7d43e32","name":"Hao Cheng","hidden":false},{"_id":"6a1e63af808ddbc3c7d43e33","name":"Baolin Peng","hidden":false},{"_id":"6a1e63af808ddbc3c7d43e34","name":"Huan Zhang","hidden":false},{"_id":"6a1e63af808ddbc3c7d43e35","name":"Tong Zhang","hidden":false},{"_id":"6a1e63af808ddbc3c7d43e36","name":"Jianfeng Gao","hidden":false}],"publishedAt":"2026-06-01T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents","submittedOnDailyBy":{"_id":"64d45451c34a346181b130dd","avatarUrl":"/avatars/9bb8205b889337df5d321539c9b5d69d.svg","isPro":true,"fullname":"Rui Yang","user":"Ray2333","type":"user","name":"Ray2333"},"summary":"Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large collections of curated web trajectories. This dependence creates a major scalability bottleneck: high-quality demonstrations are expensive to collect, and static datasets offer limited coverage of the diverse, ever-changing open web. Although online RL has shown promise for text-based agents, its potential for training visual web agents directly on live websites remains largely underexplored. In this paper, we introduce OpenWebRL, an open framework for training visual web agents with online multi-turn RL on real websites. OpenWebRL covers the full training pipeline, including scalable live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization. Using this framework, we train OpenWebRL-4B, which establishes a new open-source state of the art on challenging live-web benchmarks. With only 0.4K initialization trajectories and 2.2K open-ended RL training tasks, OpenWebRL-4B achieves 67.0% success on Online-Mind2Web and 64.0% on DeepShop, outperforming prior open agents of similar or larger scale and remaining competitive with proprietary systems including OpenAI CUA and Gemini CUA. Beyond strong benchmark performance, we systematically study the key design choices that make online RL effective for visual web agents, and analyze how RL improves agentic reasoning. Overall, our work offers a practical path toward building more capable, reproducible, and cost-efficient open web agents. We will release our training data, models, and code to support future research.","upvotes":9,"discussionId":"6a1e63af808ddbc3c7d43e37","projectPage":"https://openwebrl.github.io","githubRepo":"https://github.com/OpenWebRL/OpenWebRL","githubRepoAddedBy":"user","ai_summary":"OpenWebRL presents a framework for training visual web agents using online reinforcement learning on real websites, achieving state-of-the-art performance with minimal initial supervision.","ai_keywords":["visual web agents","online reinforcement learning","multi-turn RL","live-browser infrastructure","supervised initialization","multimodal context management","trajectory-level success judging","policy optimization","benchmark performance"],"githubStars":4,"organization":{"_id":"5e6485f787403103f9f1055e","name":"microsoft","fullname":"Microsoft","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583646260758-5e64858c87403103f9f1055d.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64d45451c34a346181b130dd","avatarUrl":"/avatars/9bb8205b889337df5d321539c9b5d69d.svg","isPro":true,"fullname":"Rui Yang","user":"Ray2333","type":"user"},{"_id":"62927c2e56fedc76e396b3ca","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678105603200-62927c2e56fedc76e396b3ca.jpeg","isPro":false,"fullname":"HAO BAI","user":"JackBAI","type":"user"},{"_id":"67bc6ccdf9641a9ff17341bc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/gvpruLW-r3sQVU0AG8Rn-.png","isPro":false,"fullname":"Yuxi Chen","user":"yuxi5","type":"user"},{"_id":"6363a4f4ff4b318d1b775420","avatarUrl":"/avatars/c709a528db30fd81865de040710b4578.svg","isPro":false,"fullname":"Luo","user":"amandaa","type":"user"},{"_id":"61e52be53d6dbb1da842316a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e52be53d6dbb1da842316a/gx0WGPcOCClXPymoKglc4.jpeg","isPro":false,"fullname":"Börje Karlsson","user":"tellarin","type":"user"},{"_id":"63ca8e060609f1def7e6548a","avatarUrl":"/avatars/1da7947840cb87d5f77c0af9ee11f9c2.svg","isPro":true,"fullname":"Yi Jung","user":"YJ-142150","type":"user"},{"_id":"687363d49a81c7dcbcfa2d84","avatarUrl":"/avatars/5d943a5c811ed931c3fdcfee19253049.svg","isPro":false,"fullname":"jj","user":"realman123","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"699ed28973664c298a5c2449","avatarUrl":"/avatars/d6dad90aab72404ee7fefd13b3714e1d.svg","isPro":false,"fullname":"오 서윤","user":"charlottejo62","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"5e6485f787403103f9f1055e","name":"microsoft","fullname":"Microsoft","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583646260758-5e64858c87403103f9f1055d.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.02031.md"}">
Papers
arxiv:2606.02031

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Published on Jun 1
· Submitted by
Rui Yang
on Jun 2
Authors:
,
,
,
,
,
,
,
,
,

Abstract

OpenWebRL presents a framework for training visual web agents using online reinforcement learning on real websites, achieving state-of-the-art performance with minimal initial supervision.

AI-generated summary

Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large collections of curated web trajectories. This dependence creates a major scalability bottleneck: high-quality demonstrations are expensive to collect, and static datasets offer limited coverage of the diverse, ever-changing open web. Although online RL has shown promise for text-based agents, its potential for training visual web agents directly on live websites remains largely underexplored. In this paper, we introduce OpenWebRL, an open framework for training visual web agents with online multi-turn RL on real websites. OpenWebRL covers the full training pipeline, including scalable live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization. Using this framework, we train OpenWebRL-4B, which establishes a new open-source state of the art on challenging live-web benchmarks. With only 0.4K initialization trajectories and 2.2K open-ended RL training tasks, OpenWebRL-4B achieves 67.0% success on Online-Mind2Web and 64.0% on DeepShop, outperforming prior open agents of similar or larger scale and remaining competitive with proprietary systems including OpenAI CUA and Gemini CUA. Beyond strong benchmark performance, we systematically study the key design choices that make online RL effective for visual web agents, and analyze how RL improves agentic reasoning. Overall, our work offers a practical path toward building more capable, reproducible, and cost-efficient open web agents. We will release our training data, models, and code to support future research.

Community

OpenWebRL presents a fully open framework for training visual web agents with online multi-turn reinforcement learning on real websites. It covers the full pipeline from live-browser infrastructure and supervised initialization to context management, trajectory-level judging, and policy optimization. The resulting OpenWebRL-4B achieves strong open-source performance on challenging live-web benchmarks, offering a practical and reproducible path toward more capable open web agents. Code (https://github.com/OpenWebRL/OpenWebRL) and data (https://huggingface.co/OpenWebRL) are all open-sourced.
image

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.02031
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.02031 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.02031 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.02031 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers