llama.cpp releases · May 29, 2026 · 1 min read

b9411

#model-release #version-bump

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

Like Read original ↗

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346)

llama : support DeepSeek V3.2 model family (with DSA lightning indexer)
convert : handle DeepseekV32ForCausalLM architecture
ggml : support for f16 GGML_OP_FILL
memory : separate hparams argument in llama_kv_cache constructor
memory : add llama_kv_cache_dsa memory (KV cache + lightning indexer cache)
llama : support for LLM_ARCH_DEEPSEEK32
model : llama_model_deepseek32 implementation
model : merge two scale operations into one in DSA lightning indexer implementation
chore : remove unused code
model : support NVFP4 in DeepSeek V3.2

Co-authored-by: Sigbjørn Skjæret [email protected]

memory : refactoring TODO

Co-authored-by: ggerganov [email protected]

Co-authored-by: Stanisław Szymczyk [email protected]
Co-authored-by: Sigbjørn Skjæret [email protected]
Co-authored-by: ggerganov [email protected]

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases