r/LocalLLaMA · · 3 min read

I built a computer use sandbox framework for codex on headless linux. GPU passthrough, computer use, and sudo access for codex all work. It's the perfect dev sandbox to allow full auto work while minimizing the "rm -rf /" risk

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I've been working with agents for months now, and I haven't found a sandbox environment that "just works" so I built it!

My requirements were as follows:

  1. Agent is unable to destroy my host OS but able to install software and run sudo commands
  2. Agent is able to browse the web autonomously and validate the UI it creates
  3. GPU access works (even on DGX spark which cant pass through to
  4. Docker works
  5. Persistent environment I can setup once, log into my internet accounts I want the agent to access, copy in my .env files, install custom software etc.
  6. Support multiple parallel browser use / development sessions concurrently
  7. Easily log into each agent's desktop to view the work it's doing or manually setup the agent environment via a desktop interface

The inspiration for this project is wanting a sandbox I can let the agent run free in, while limiting the damage it can do. I want it to be able to browse the web, do automated AI research on my GPU, test my docker containers in a sandbox, develop my webapp full-auto, or whatever other task I need it to do while still being safely in a sandbox and unable to wipe or modify my host system.

I felt like either I had to go full YOLO mode on my host machine, and risk a catostrophic failure, or I had to let my agent work inside the extremely annoying to use default codex sandbox.

My code is available here:

https://github.com/fieryWaters/ai-sandbox-manager

It was developed and tested on the DGX spark, since its especially difficult to get this working on the unified architecture since you cant pass a GPU unto a VM, but with minimal modifications, it should work on macos or windows WSL.

The core idea behind the sandbox is basically a VM. You setup the VM for your agent, similar to as if it were your own desktop OS you're developing on. Once setup, you save the image as a template then you can spin up multiple copies willy nilly and then you let your agent run free with full sudo access.

Because true VM's can't share resources like a GPU, I chose to create the image as an LXC. This allows multiple VM instances to share a GPU so you could run multiple agents doing smoke test training runs on tiny models to build out different features autonomously and in parallel similar to Karpathy's autogpt project.

For computer use, I have https://github.com/trycua/cua to thank. This project works amazingly, since getting computer use on linux is currently not supported by default.

I setup a hook for codex to prevent git push's, but in a later version I might refine it just to prevent force pushing. The idea being the agent can't do anything critically damaging, like rewriting the git history. You go in and periodically push changes after you validate.

I wouldn't call this ai-sandbox-manager repo polished, more of a proof of concept, but I find it truly useful for my personal work and solves a real problem I have, so I wanted to share it. If anyone wants to help build it out for macos or Windows or WSL, feel free to make a PR. Otherwise, feel free to clone and adapt to your personal workflows.

submitted by /u/superSmitty9999
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA