llama: avoid copying logits during prompt decode in MTP by am17an · Pull Request #23198 · ggml-org/llama.cpp
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| time to update your llama.cpp -> improved prompt processing speed [link] [comments] |
More from r/LocalLLaMA
-
I hope that someday we will have a 124B Gemma.
May 17
-
ROCm 7.13 nightly adds strix halo optimizations
May 17
-
The power of structured workflows and small local models
May 17
-
MiroThinker-1.7, an open-weight deep research agent (Qwen3 MoE base) — mini is 30B/3B active, curious what tok/s people get on consumer hardware
May 17
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.