llama.cpp releases · · 1 min read

b9411

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346)

  • llama : support DeepSeek V3.2 model family (with DSA lightning indexer)

  • convert : handle DeepseekV32ForCausalLM architecture

  • ggml : support for f16 GGML_OP_FILL

  • memory : separate hparams argument in llama_kv_cache constructor

  • memory : add llama_kv_cache_dsa memory (KV cache + lightning indexer cache)

  • llama : support for LLM_ARCH_DEEPSEEK32

  • model : llama_model_deepseek32 implementation

  • model : merge two scale operations into one in DSA lightning indexer implementation

  • chore : remove unused code

  • model : support NVFP4 in DeepSeek V3.2

Co-authored-by: Sigbjørn Skjæret [email protected]

  • memory : refactoring TODO

Co-authored-by: ggerganov [email protected]


Co-authored-by: Stanisław Szymczyk [email protected]
Co-authored-by: Sigbjørn Skjæret [email protected]
Co-authored-by: ggerganov [email protected]

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases