r/LocalLLaMA · · 2 min read

[browser-use-wasm] I made a browser-use agent that runs in WASM at zero cost

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

[browser-use-wasm] I made a browser-use agent that runs in WASM at zero cost

The only cost is electricity! I built this in a few weeks since I couldn't find anything else like it.

Demo: https://pdufour.github.io/browser-use-wasm/
Source Code: https://github.com/pdufour/browser-use-wasm

One thing I've wanted to do for a while was add a widget to my page that allowed me to control the complete webpage just like any of the browser-use agents can. The key distinction is I wanted it to be fully self-contained, no serve involved.

After a few weeks of tinkering I have a fairly good browser-use model running entirely via Snapdom / WASM / WebGPU / Wllama / ShowUi-2b and a little JS to tie it all together.

The browser use library I developed can handle all this:

  • Typing into fields
  • Clicking links
  • Multi-turn actions (click on input, type something into it, click submit button) - all from one prompt - works 50% of the time
  • Changing dropdown options

Some lessons I learned making things others might find helpful:

  1. Tests are your friend, finding mind2web https://github.com/OSU-NLP-Group/Mind2Web and MiniWob https://github.com/Farama-Foundation/miniwob-plusplus helped me continuously improve the accuracy on the browser-use actions
  2. Browser use is very very hard. I've only supported a limited set of actions and even getting to that point was quite hard. To handle complex queries you need some kind of interaction loop but then you run into problems like figuring out when to end the loop.
  3. Accuracy matters. For the longest time my click actions were off by a few px and I finally was able to track down the issue to the snapdom library. When a click is off by a few px that could mean its clicking in blank space rather than a button. I'm so glad this is fixed - https://github.com/zumerlab/snapdom/issues/421.

This code is super super alpha and a lot of stuff is probably broken but I thought I would share with Reddit to ask for feedback and see if people had any ideas on how to develop this further. I'm open to any ideas!

submitted by /u/dammitbubbles
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA