r/LocalLLaMA · May 19, 2026 · 1 min read

Is there a way to disable reasoning per request in llama.cpp's llama-server, while leaving it on by default?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Title. I've got a llama.cpp server running a model being accessed across a number of scripts, and some of them are easier for the model than others, and those easier ones are also latency dependent. Rather than host two different servers with different parameters, I'd rather just send something along with the prompt to disable it.

If I must host multiple servers, am I able to host two servers for the same model but only have the model loaded in memory once? VRAM limited, like most of you I'm sure.

submitted by /u/Mrinohk
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA