r/MachineLearning · · 1 min read

WebHarbor - We "dock" the real websites into local for web agents! [R]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Hello! Excited to share our latest community-driven research project: WebHarbor: Docking Real Websites for Evolving GUI Agent Environments!

TL;DR: 15 popular websites (Amazon, GitHub, BBC News, arXiv, Booking, Hugging Face, etc.) packaged as self-contained Flask + SQLite apps in a single Docker image, with a control plane that resets each site to byte-identical state in <1 second, all by human-in-the-loop coding agent (e.g., Claude Code or CodeX). We support all 643 WebVoyager tasks out of the box.

Call for contribution: Our Next goal is 100+ popular websites — covering all of Online-Mind2Web (147 sites) and beyond. Two tracks:

  • Contribute a new mirror site (use the coding-agent pipeline → human verify → open PR) → co-author on the final paper
  • Review submitted PRs (5 reviews → co-author)

We also released useful skills for you(your coding agent) to work on it! Typically you can create a new mirron within 1 day! See more contribution details at Contribute Guide.

Why WebHarbor: running web agent benchmarks on the live web is a nightmare — reCAPTCHA, geo-blocks, content drift, network flakiness, and tasks that go stale within months. Plus you can't reset the live web, which rules out heavy RL training. You will need a lightweight, easy-to-reset, task-driven evolving environments for web agent, both evaluation and training!

Related Resources:

Name Link
🏠 WebHarbor Project Page WebHarbor
🤗 HuggingFace Dataset ChilleD/WebHarbor
💻 WebHarbor GitHub Code Repo
📊 Contribution Guide Guide Details
📝 Contribution Request Form Google Form

Welcome suggestions and discussions!

submitted by /u/ArtichokeHelpful7462
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning