llama.cpp releases · · 1 min read

b9291

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

SYCL: improve MoE prefill throughput (#23142)

  • change k_copy_src1_to_contiguous so that uses a precomputed contiguous mapping where all rows "owned" by an expert are in one slice with a know starts and ends
  • switch the O(n_as * n_routed_rows) contraption to a counting sort-based procedure with O(n_as + n_routed_rows) complexity

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases