llama.cpp releases · · 1 min read

b9510

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (#22209)

  • ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128

Optimize the inner loop of ggml_vec_dot_q4_1_q8_1_generic using
WASM SIMD128 intrinsics, gated behind #ifdef wasm_simd128 so
non-wasm builds are completely unaffected.

Approach:

  • single wasm_v128_load covers all 32 packed 4-bit weights
  • nibbles unpacked via AND/SHR into two u8x16 registers
  • widened to i16 before multiply (WASM SIMD has no i8*i8 instruction)
  • 4x wasm_i32x4_dot_i16x8 calls accumulate all 32 element pairs
  • horizontal reduce via 4x wasm_i32x4_extract_lane

Benchmark (node v25, emcc -O3 -msimd128, 64 blocks x QK8_1=32,
200k iterations):

impl ns/call speedup
scalar 880.7 1.00x
simd 257.8 3.42x

Correctness verified against scalar reference across 10 random seeds
with exact output match.

  • ggml: move q4_1_q8_1 WASM SIMD implementation to wasm backend

Relocate the SIMD128 implementation of ggml_vec_dot_q4_1_q8_1 to ggml/src/ggml-cpu/arch/wasm/quants.c to follow architecture-specific layout. Restore the generic implementation in ggml/src/ggml-cpu/quants.c.
Move for loop in the else block.

  • ggml: use generic q4_1_q8_1 fallback in wasm backend

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases