llama.cpp releases · · 1 min read

b9122

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

ggml-webgpu: address precision issues for multimodal (#22808)

  • fix(mixed-types): use f32 for precision and update the shared memory calculation logic for f32

  • fix(unary): correct the gelu, gelu quick and gelu erf functions

  • fix(flash-attn-tile): fix the hardcode v type

  • fix(flash_attn): fix tile path

  • fix: pass editorconfig and address the type conflicts

  • fix: remove reduant pipeline keys

  • fix: remove inline min/max group size functions and revert the flash attn path order

  • fix: use clamp to avoid NaN for GELU

  • fix: use the right range for exp, 80 is safer for f32 exp

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases