r/LocalLLaMA · · 1 min read

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

Saw this on other sub so posting here.

For Intel ARC card holders. Big boost so update llama.cpp version(b9519 onwards)

submitted by /u/pmttyji
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA