CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Implemented(by u/am17an) FWHT for CUDA, speed-up for cases when we quantize the kv-cache. 1-2% boost on pp & 7-9% boost on tg. Performance on a 5090 with
[link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.