Here's a llama.cpp CLI Command builder.
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
No accounts or sign up. No email requirements. No pop-ups and no cookies. No ads. Info is saved locally in your browser so you dont lose any progress.
Its got every single flag and argument that could be found in the documentation. Tool tips are added to everything. Every field is editable.
Once you build the CLI or server command you can add your run info and add the run to the log and track which configuration works best for your hardware.
Only Linux support currently. Maybe Mac and windows tabs in the future.
[link] [comments]
More from r/LocalLLaMA
-
ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants by yomaytk · Pull Request #24225 · ggml-org/llama.cpp
Jun 9
-
2X tk/s (from 19.4 -> 38.1 tk/s on 1 x MI50) Playing with a hypothesis like speculative decoding.. but instead of an additional side model, exploiting that I can run multiple computations side-by-side AS IF I had Qwen3.6-27B loaded twice in memory - small quants don't use all…
Jun 9
-
Jetbrains Mellum 2: a really good and performant model
Jun 9
-
I fine-tuned Parakeet 0.6B for medical ASR — open weights, local Mac/CUDA/CPU
Jun 9
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.