r/LocalLLaMA · · 1 min read

Llamacpp server : How do the -np and -c flags interact?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I've been using lm studio for a few months. I want to try hermes agents with Qwen 3.6 MoE, so I'm switching to llama.cpp and I don't understand well how the server slots -np and the context size -c interact.

The context for each parallel client appears to be equally distributed across server slots (so each client is allowed c / np context).

I have some questions:

- What are the consequences of launching a server with a greater context -c than what the model allows?

- What if c / np is greater than the model max context? Are there any negative to that regarding model performance?

- If a rig allows to allocate twice the context max size in vram, is it twice energy and time efficient to serve two agents in parallel rather than sequentially?

submitted by /u/Doug_Fripon
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA