Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

455 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 1mo ago

b9409

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…

7
llama.cpp releases dev-tools 1mo ago

b9406

llama: add llm_graph_input_mtp ( #23643 ) llama: add llm_graph_input_mtp rename input_mtp -> input_token_embd add TODO about mtmd embedding cont : clean-up Co-authored-by: Georgi Gerganov [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

38
llama.cpp releases dev-tools 1mo ago

b9405

app : move licences to llama-app ( #23824 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…

4
llama.cpp releases dev-tools 1mo ago

b9403

meta : Add missing buffer set in allreduce fallback !COMPUTE clear ( #23480 ) Without this at least the vulkan backend will skip the * 0 for !COMPUTE tensors, causing corrupt output. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…

26
llama.cpp releases dev-tools 1mo ago

b9402

hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion ( #23835 ) Updating infra to enable op fusion and using RMS_NORM+MUL as the use-case. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…

17
llama.cpp releases dev-tools 1mo ago

b9401

mtmd-debug: add color and rainbow mode ( #23829 ) mtmd-debug: add color and rainbow mode fix M_PI max_dist macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

32
llama.cpp releases dev-tools 1mo ago

b9400

mtmd: fix gemma 4 projector pre_norm ( #23822 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

22
llama.cpp releases dev-tools 1mo ago

b9399

opencl: move backend info printing into its own function ( #23702 ) opencl: move backend info print into its own function opencl: move new log line opencl: fix for non adreno path macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…

20
llama.cpp releases dev-tools 1mo ago

b9404

cuda : disables launch_fattn PDL enrollment due to compiler bug ( #23825 )

25
llama.cpp releases dev-tools 1mo ago

b9395

app : improve help output ( #23805 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

25
llama.cpp releases dev-tools 1mo ago

b9394

mtmd: n_head_kv defaults to n_head ( #23782 ) removed AI-generated comment macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

9
llama.cpp releases dev-tools 1mo ago

b9393

mtmd: fix gemma 4 audio rms norm eps ( #23815 ) mtmd: fix gemma 4 audio rms norm eps Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret [email protected] Co-authored-by: Sigbjørn Skjæret [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS…

34
llama.cpp releases dev-tools 1mo ago

b9391

arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file ( #23167 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

4
llama.cpp releases dev-tools 1mo ago

b9389

ggml: auto apply iGPU flag CUDA/HIP if integrated device ( #23007 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

27
llama.cpp releases dev-tools 1mo ago

b9388

mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … ( #23729 ) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for SM75 TURING avoid a mismatch for JIT compilation of Turing device code for Ampere or newer Co-authored-by: Johannes Gäßler…

38
llama.cpp releases dev-tools 1mo ago

b9387

CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware ( #23227 ) CUDA: per-quant MMVQ/MMQ batch threshold on AMD MFMA hardware The dispatcher uses a single global threshold (MMVQ_MAX_BATCH_SIZE = 8) to choose between mul_mat_vec_q (per-row GEMV) and mul_mat_q…

38
llama.cpp releases dev-tools 1mo ago

b9386

server: minor tweaks to use more cpp features ( #23785 ) misc(server): add default port to impl RAII misc(server): register_gcp_compat() can be const misc(server): use proper cpp const/auto methods misc(server): do not reset a unique_ptr, use make_unique instead to be exception…

34
llama.cpp releases dev-tools 1mo ago

b9384

vulkan: fast path for walsh-hadamard transform ( #23687 ) vulkan: fast path for walsh-hadamard transform disable for intel due to segfault macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…

16
llama.cpp releases dev-tools 1mo ago

b9383

chat : add Granite 4.1 chat template ( #23518 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

38
llama.cpp releases dev-tools 1mo ago

b9382

vulkan: fix wrong index variable in inner loop ( #23665 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

7
llama.cpp releases dev-tools 1mo ago

b9381

vulkan: Fix memory logger unsafe iterator access ( #23667 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

36
llama.cpp releases dev-tools 1mo ago

b9380

server, ui : Add support for HTTP ETags in llama-server ( #23701 ) allow caching of ui elements in llama-server use fnv_hash Update tools/server/server-http.cpp etag has to be set always Co-authored-by: Xuan-Son Nguyen [email protected] Co-authored-by: Xuan-Son Nguyen…

4
llama.cpp releases dev-tools 1mo ago

b9378

cuda : fix KQ mask offset integer overflow in fattn MMA kernel ( #23610 ) Co-authored-by: Stanisław Szymczyk [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

32
llama.cpp releases dev-tools 1mo ago

b9377

perplexity : fix format specifier in LOG_ERR ( #23788 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

37
llama.cpp releases dev-tools 1mo ago

b9375

ggml: fixed Arm SVE usage bug in vec.h, vec.cpp ( #22841 ) Updated vec.h/vec.cpp code to accumulate to F32 rather than F16 Change-Id: I0cb789347f2bf60ffaf9047319f727e788c825f8 Signed-off-by: Martin Klacer [email protected] Co-authored-by: Milos Puzovic [email protected]…

28
llama.cpp releases dev-tools 1mo ago

b9374

ci : refactor ( #23789 ) ci : separate CUDA windows workflow + fix names ci : rename workflow ci : prefix cache names with workflow name ci : rename build.yml -> build-cpu.yml ci : cache keys ci : fix windows cuda/hip concurrency of release workflow ci : fix apple cache names ci…

15
llama.cpp releases dev-tools 1mo ago

b9371

ggml-webgpu: remove legacy constants ( #23672 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

17
llama.cpp releases dev-tools 1mo ago

b9370

hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID ( #23647 ) hex-mm: add support for Q4_1 matmul/matvec, hvx-only for now hmx-mm: add support for Q4_1 hex-mm: use Q8_1 dynamic quantization to avoid having to compute sums in the vec_dot hexagon: fix repack scratch buffer…

6
llama.cpp releases dev-tools 1mo ago

b9368

vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 ( #22887 ) vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 Against mesa git, this shows a 4.8% performance improvement for tg128 on Qwen3.5-9B:BF16 on Intel BMG. Note that this breaks some tests until the last…

34
llama.cpp releases dev-tools 1mo ago

b9369

ggml-webgpu: Fix how to dispatch WG to some ops ( #23750 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

31
llama.cpp releases dev-tools 1mo ago

b9367

vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul ( #23541 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

28
llama.cpp releases dev-tools 1mo ago

b9366

vulkan: add REPEAT op support for f16 to f16. ( #23298 ) feat: extend repeat op for vulkan feat: add repeat_f16 vulkan pipeline fix: ensure same dst and src types fix: use type_size instead of data types fix: use int16 and int32 for repeat shader op chore: rename repeat_f* to…

5
llama.cpp releases dev-tools 1mo ago

b9365

ci : move ARM jobs to self-hosted + disable kleidiai mac release ( #23780 ) ci : move ARM jobs to 3rd-party runners + disable kleidiai release cont : fix deps + fix names ocd : fix names cont : fix PR links macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

33
llama.cpp releases dev-tools 1mo ago

b9360

common : fix env names to all have LLAMA_ARG_ prefix ( #23778 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

7
llama.cpp releases dev-tools 1mo ago

b9357

vulkan: avoid preferring transfer queue on AMD UMA devices ( #22455 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

38
llama.cpp releases dev-tools 1mo ago

b9354

convert: add MiniCPM5 tokenizer support ( #23384 ) Add minicpm5 pre-tokenizer hash via convert_hf_to_gguf_update.py and implement hardcoded regex handling in llama-vocab.cpp, consistent with other BPE pre-tokenizers. Co-authored-by: zhangtao [email protected] macOS/iOS:…

11
llama.cpp releases dev-tools 1mo ago

b9353

server : fix the log message when using SSL ( #23393 ) When llama-server is started with SSL key and cert, the log says that it listens on http instead of https. This patch fixes this. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…

7
llama.cpp releases dev-tools 1mo ago

b9352

ggml-zendnn : fixed naming of matmul function ( #20964 ) ggml-zendnn: fixed naming of matmul function ggml-zendnn: fixed naming of mul_mat_id function ggml-zendnn: fixed print in mul_mat_id Co-authored-by: plotnikov.v10 [email protected] macOS/iOS: macOS Apple Silicon (arm64)…

4
llama.cpp releases dev-tools 1mo ago

b9351

macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Ubuntu x64…

20
llama.cpp releases dev-tools 1mo ago

b9334

CUDA: missing PDL sync for FWHT, better fallback ( #23690 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

19
llama.cpp releases dev-tools 1mo ago

b9333

metal : add apple device id ( #23566 ) Co-authored-by: lvyichen [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

14
llama.cpp releases dev-tools 1mo ago

b9331

ci : reduce PR jobs by matching backend paths ( #23675 ) ci : disable SYCL f16 builds ci : extract android and hip into separate workflows ci : move webgpu to separate workflow ci : move the rpc to a separate workflow ci : extract s309x and ppcl jobs ci : extract opencl job into…

24
llama.cpp releases dev-tools 1mo ago

b9341: convert : support Gemma4ForCausalLM architecture (#23682)

convert : support Gemma4ForCausalLM architecture ( #23674 ) fix indent Co-authored-by: Oleg Afonin [email protected] Co-authored-by: Sigbjørn Skjæret [email protected]

15
llama.cpp releases dev-tools 1mo ago

b9330

model: tag ffn_latent as MUL_MAT to fix buft probe ( #23664 ) ffn_latent_down/up are declared GGML_OP_MUL in LLM_TENSOR_INFOS but nemotron-h feeds them through ggml_mul_mat. The loader buft probe asks the backend about the declared op, so it tested an elementwise MUL on a q8_0…

7
llama.cpp releases dev-tools 1mo ago

b9329

CUDA: add fast walsh-hadamard transform ( #23615 ) CUDA: add fast walsh-hadamard transform review: add unrolls + change size_t -> int warp size 64 Co-authored-by: Johannes Gäßler [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

11
llama.cpp releases dev-tools 1mo ago

b9326

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…

26
llama.cpp releases dev-tools 1mo ago

b9320

TP: fix ggml context size calculation ( #22616 ) TP: fix ggml context size calculation, memory leak move split state cache back into the context revert to constant ggml context size for cgraphs increase headroom for statically allocated tensors remove obsolete include macOS/iOS:…

33
llama.cpp releases dev-tools 1mo ago

b9319

ggml: gguf_init_from_callback and gguf_init_from_buffer ( #22341 ) ggml: implement gguf_init_from_buffer test: gguf_init_from_buffer fix: memory breakdown for a model loaded with no_alloc from a file is consistent with being loaded from a buffer fix: use GGML_UNUSED…

9
llama.cpp releases dev-tools 1mo ago

b9318

server: MTP layer kv-cache should respect draft type ctk ( #23646 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

9
llama.cpp releases dev-tools 1mo ago

b9315

llama : document that only one on-device state can be saved per sequence ( #23520 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

13

Page 7 of 10 · 455 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *