sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Saw this on other sub so posting here. For Intel ARC card holders. Big boost so update llama.cpp version(b9519 onwards) [link] [comments] |
More from r/LocalLLaMA
-
AA comparison of the latest local models
Jun 6
-
A quick Gemma4 31B comparison (Q4_k_M, QAT, heretic)
Jun 6
-
Github Copilot finally supporting custom endpoints
Jun 6
-
OpenLumara - A different kind of AI agent, written from scratch, not vibecoded. Extremely token-efficient, super small system prompt, made for local models. Everything is modular.
Jun 5
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.