llama.cpp releases · May 21, 2026 · 1 min read

b9260

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

Like Read original ↗

opencl: refactor backend initilization (#23318)

opencl: refactor initialization
opencl: refactor GPU identification
opencl: rename for consistency
opencl: cache global mem size in dev_ctx
opencl: adjust log level
opencl: load argsort and flash_attn kernels in supports_op
argsort kernel must be built for supports_op for querying the max
workgroups
flash_attn kernel has many variants, only load them when needed

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases