How I do use the recent llama.cpp native tools to do web rag a.k.a. web_fetch (or anything else for the matter) directly from inside the llama-server's webui
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
As some other fellow lllmers I've discovered few days ago that the amazing llama.cpp project has just added native tools functionalities into the server.
After having enabled the relative options into llama-server and played a bit with the most harmless of them all, get_datetime, I've bit the bullet and cautiously enabled the big boss: exec_shell_command.
Building upon my recent sandboxing efforts relative to pi coding agent, another fantastic tool, I implemented this workflow to more safely use it into linux by multi-sandboxing:
step 0) enabled llama-server options for native tools
step 1) install firejail system wide
step 2) create a new linux user called vmagents (a.k.a. "virtual machine agent smith") to prevent escalation or messing up with my own user workspace home dir
step 3) login into vmagents user and install smolmachines, an easy to use OCI virtual machine containers harness
step 4) create a VM called minivm and start it to pull in a bare bones busybox commands based Alpine linux OCI image
step 5) create the script minivm-exec (and make it executable) into vmagents exec dir to spinup the sandbox VM, exec a given command into it into further firejail sandbox, turn it off
step 6) into my own usual user workspace exec dir create another script (and make it executable) called vm-exec to invoke the previous minivm-exec script using the vmagents user credentials
step 7) into llama-server webui exec a prompt for example like this:
retrive today's latest news for Italy and tell me which one is the most charming. Prepend any command to be executed with the sandboxing wrapper vm-exec. Use wget to fetch web content adding the option "-U Mozilla" as browser user agent string
DONE!!!
Above said detailed steps:
0 ) llama-server --model Qwen3.6-35B-A3B_MTP-UD-Q8_K_XL.gguf --flash-attn on --no-mmap --jinja --threads-http 4 --prio 2 --tools get_datetime,exec_shell_command --temp 0.6 --top-p 0.95 --top-k 20 --presence-penalty 1.5 --min-p 0.00 --chat-template-kwargs '{"preserve_thinking":true}' --spec-type draft-mtp --spec-draft-n-max 1
1 ) yay -Sy firejail (or sudo pacman on Manjaro/Arch linux)
2 ) sudo useradd -m vmagents; sudo passwd vmagents
3.1 ) sudo su - vmagents
3.2 ) curl -sSL https://smolmachines.com/install.sh | bash
4.1 ) smolvm machine create minivm --image alpine --net
4.2 ) smolvm machine start --name minivm
5 ) /home/vmagents/.local/bin/minivm-exec
#!/bin/sh
smolvm machine start --name minivm >/dev/null
firejail smolvm machine exec --name minivm -- $* 2>/dev/null
smolvm machine stop --name minivm >/dev/null
6 ) /home/<MYUSER>/.local/bin/vm-exec
#!/bin/sh
sudo su - vmagents -c "minivm-exec $*"
[link] [comments]
More from r/LocalLLaMA
-
Qwen3.6-35B-A3B vs Gemma4-26B-A4B
May 24
-
Qwen Plays ̶p̶̶o̶̶k̶̶e̶̶m̶̶o̶̶n̶ ? / QWEN PLAYS DCSS! - qwen3.6-35b-a3b@q4_k_xl plays open source roguelike adventure DCSS (and does a decent job)
May 24
-
Why not dynamic active parameters (and other questions for the knowledgeable)
May 24
-
Choosing an abliterated version of Gemma 4 31B and 26B-A4B
May 24
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.