r/LocalLLaMA · · 2 min read

How I do use the recent llama.cpp native tools to do web rag a.k.a. web_fetch (or anything else for the matter) directly from inside the llama-server's webui

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

As some other fellow lllmers I've discovered few days ago that the amazing llama.cpp project has just added native tools functionalities into the server.

After having enabled the relative options into llama-server and played a bit with the most harmless of them all, get_datetime, I've bit the bullet and cautiously enabled the big boss: exec_shell_command.

Building upon my recent sandboxing efforts relative to pi coding agent, another fantastic tool, I implemented this workflow to more safely use it into linux by multi-sandboxing:

step 0) enabled llama-server options for native tools

step 1) install firejail system wide

step 2) create a new linux user called vmagents (a.k.a. "virtual machine agent smith") to prevent escalation or messing up with my own user workspace home dir

step 3) login into vmagents user and install smolmachines, an easy to use OCI virtual machine containers harness

step 4) create a VM called minivm and start it to pull in a bare bones busybox commands based Alpine linux OCI image

step 5) create the script minivm-exec (and make it executable) into vmagents exec dir to spinup the sandbox VM, exec a given command into it into further firejail sandbox, turn it off

step 6) into my own usual user workspace exec dir create another script (and make it executable) called vm-exec to invoke the previous minivm-exec script using the vmagents user credentials

step 7) into llama-server webui exec a prompt for example like this:

retrive today's latest news for Italy and tell me which one is the most charming. Prepend any command to be executed with the sandboxing wrapper vm-exec. Use wget to fetch web content adding the option "-U Mozilla" as browser user agent string

DONE!!!

Above said detailed steps:

0 ) llama-server --model Qwen3.6-35B-A3B_MTP-UD-Q8_K_XL.gguf --flash-attn on --no-mmap --jinja --threads-http 4 --prio 2 --tools get_datetime,exec_shell_command --temp 0.6 --top-p 0.95 --top-k 20 --presence-penalty 1.5 --min-p 0.00 --chat-template-kwargs '{"preserve_thinking":true}' --spec-type draft-mtp --spec-draft-n-max 1

1 ) yay -Sy firejail (or sudo pacman on Manjaro/Arch linux)

2 ) sudo useradd -m vmagents; sudo passwd vmagents

3.1 ) sudo su - vmagents

3.2 ) curl -sSL https://smolmachines.com/install.sh | bash

4.1 ) smolvm machine create minivm --image alpine --net

4.2 ) smolvm machine start --name minivm

5 ) /home/vmagents/.local/bin/minivm-exec

#!/bin/sh

smolvm machine start --name minivm >/dev/null

firejail smolvm machine exec --name minivm -- $* 2>/dev/null

smolvm machine stop --name minivm >/dev/null

6 ) /home/<MYUSER>/.local/bin/vm-exec

#!/bin/sh

sudo su - vmagents -c "minivm-exec $*"

submitted by /u/DevelopmentBorn3978
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA