llama.cpp releases · · 1 min read

b9255

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

hexagon: HMX quantized matmul rework (#23368)

  • hmx-mm: update debug logging in hmx-mm

  • hmx-mm: update dequant logic to use HVX_vector_x2/4

  • hmx-mm: remove non-pipelined version of the quantize matmul

It seems that we don't reall need non-pipelined version

  • hmx-mm: use activation depth mode and update naming

Co-authored-by: Kim-Chyan Gan [email protected]

  • hex-mm: minor hmx matmul naming updates

  • hmx-mm: remove unused vars

  • snapdragon: scripts bump default ubatch-size to 1K

  • hexagon: combine HMX and power and clock settings into a single set_power call

  • hmx-mm: remove leftover of the scale repl helper

  • hexagon: fix editconf error


Co-authored-by: Kim-Chyan Gan [email protected]

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases