llama.cpp releases
455 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 1mo ago
b9409
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…
7 -
llama.cpp releases dev-tools 1mo ago
b9406
llama: add llm_graph_input_mtp ( #23643 ) llama: add llm_graph_input_mtp rename input_mtp -> input_token_embd add TODO about mtmd embedding cont : clean-up Co-authored-by: Georgi Gerganov [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
38 -
llama.cpp releases dev-tools 1mo ago
b9405
app : move licences to llama-app ( #23824 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
4 -
llama.cpp releases dev-tools 1mo ago
b9403
meta : Add missing buffer set in allreduce fallback !COMPUTE clear ( #23480 ) Without this at least the vulkan backend will skip the * 0 for !COMPUTE tensors, causing corrupt output. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…
26 -
llama.cpp releases dev-tools 1mo ago
b9402
hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion ( #23835 ) Updating infra to enable op fusion and using RMS_NORM+MUL as the use-case. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…
17 -
llama.cpp releases dev-tools 1mo ago
b9401
mtmd-debug: add color and rainbow mode ( #23829 ) mtmd-debug: add color and rainbow mode fix M_PI max_dist macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
32 -
llama.cpp releases dev-tools 1mo ago
b9400
mtmd: fix gemma 4 projector pre_norm ( #23822 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
22 -
llama.cpp releases dev-tools 1mo ago
b9399
opencl: move backend info printing into its own function ( #23702 ) opencl: move backend info print into its own function opencl: move new log line opencl: fix for non adreno path macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…
20 -
llama.cpp releases dev-tools 1mo ago
b9404
cuda : disables launch_fattn PDL enrollment due to compiler bug ( #23825 )
25 -
llama.cpp releases dev-tools 1mo ago
b9395
app : improve help output ( #23805 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
25 -
llama.cpp releases dev-tools 1mo ago
b9394
mtmd: n_head_kv defaults to n_head ( #23782 ) removed AI-generated comment macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
9 -
llama.cpp releases dev-tools 1mo ago
b9393
mtmd: fix gemma 4 audio rms norm eps ( #23815 ) mtmd: fix gemma 4 audio rms norm eps Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret [email protected] Co-authored-by: Sigbjørn Skjæret [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS…
34 -
llama.cpp releases dev-tools 1mo ago
b9391
arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file ( #23167 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
4 -
llama.cpp releases dev-tools 1mo ago
b9389
ggml: auto apply iGPU flag CUDA/HIP if integrated device ( #23007 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
27 -
llama.cpp releases dev-tools 1mo ago
b9388
mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … ( #23729 ) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for SM75 TURING avoid a mismatch for JIT compilation of Turing device code for Ampere or newer Co-authored-by: Johannes Gäßler…
38 -
llama.cpp releases dev-tools 1mo ago
b9387
CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware ( #23227 ) CUDA: per-quant MMVQ/MMQ batch threshold on AMD MFMA hardware The dispatcher uses a single global threshold (MMVQ_MAX_BATCH_SIZE = 8) to choose between mul_mat_vec_q (per-row GEMV) and mul_mat_q…
38 -
llama.cpp releases dev-tools 1mo ago
b9386
server: minor tweaks to use more cpp features ( #23785 ) misc(server): add default port to impl RAII misc(server): register_gcp_compat() can be const misc(server): use proper cpp const/auto methods misc(server): do not reset a unique_ptr, use make_unique instead to be exception…
34 -
llama.cpp releases dev-tools 1mo ago
b9384
vulkan: fast path for walsh-hadamard transform ( #23687 ) vulkan: fast path for walsh-hadamard transform disable for intel due to segfault macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…
16 -
llama.cpp releases dev-tools 1mo ago
b9383
chat : add Granite 4.1 chat template ( #23518 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
38 -
llama.cpp releases dev-tools 1mo ago
b9382
vulkan: fix wrong index variable in inner loop ( #23665 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
7 -
llama.cpp releases dev-tools 1mo ago
b9381
vulkan: Fix memory logger unsafe iterator access ( #23667 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
36 -
llama.cpp releases dev-tools 1mo ago
b9380
server, ui : Add support for HTTP ETags in llama-server ( #23701 ) allow caching of ui elements in llama-server use fnv_hash Update tools/server/server-http.cpp etag has to be set always Co-authored-by: Xuan-Son Nguyen [email protected] Co-authored-by: Xuan-Son Nguyen…
4 -
llama.cpp releases dev-tools 1mo ago
b9378
cuda : fix KQ mask offset integer overflow in fattn MMA kernel ( #23610 ) Co-authored-by: Stanisław Szymczyk [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
32 -
llama.cpp releases dev-tools 1mo ago
b9377
perplexity : fix format specifier in LOG_ERR ( #23788 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
37 -
llama.cpp releases dev-tools 1mo ago
b9375
ggml: fixed Arm SVE usage bug in vec.h, vec.cpp ( #22841 ) Updated vec.h/vec.cpp code to accumulate to F32 rather than F16 Change-Id: I0cb789347f2bf60ffaf9047319f727e788c825f8 Signed-off-by: Martin Klacer [email protected] Co-authored-by: Milos Puzovic [email protected]…
28 -
llama.cpp releases dev-tools 1mo ago
b9374
ci : refactor ( #23789 ) ci : separate CUDA windows workflow + fix names ci : rename workflow ci : prefix cache names with workflow name ci : rename build.yml -> build-cpu.yml ci : cache keys ci : fix windows cuda/hip concurrency of release workflow ci : fix apple cache names ci…
15 -
llama.cpp releases dev-tools 1mo ago
b9371
ggml-webgpu: remove legacy constants ( #23672 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
17 -
llama.cpp releases dev-tools 1mo ago
b9370
hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID ( #23647 ) hex-mm: add support for Q4_1 matmul/matvec, hvx-only for now hmx-mm: add support for Q4_1 hex-mm: use Q8_1 dynamic quantization to avoid having to compute sums in the vec_dot hexagon: fix repack scratch buffer…
6 -
llama.cpp releases dev-tools 1mo ago
b9368
vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 ( #22887 ) vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 Against mesa git, this shows a 4.8% performance improvement for tg128 on Qwen3.5-9B:BF16 on Intel BMG. Note that this breaks some tests until the last…
34 -
llama.cpp releases dev-tools 1mo ago
b9369
ggml-webgpu: Fix how to dispatch WG to some ops ( #23750 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
31 -
llama.cpp releases dev-tools 1mo ago
b9367
vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul ( #23541 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
28 -
llama.cpp releases dev-tools 1mo ago
b9366
vulkan: add REPEAT op support for f16 to f16. ( #23298 ) feat: extend repeat op for vulkan feat: add repeat_f16 vulkan pipeline fix: ensure same dst and src types fix: use type_size instead of data types fix: use int16 and int32 for repeat shader op chore: rename repeat_f* to…
5 -
llama.cpp releases dev-tools 1mo ago
b9365
ci : move ARM jobs to self-hosted + disable kleidiai mac release ( #23780 ) ci : move ARM jobs to 3rd-party runners + disable kleidiai release cont : fix deps + fix names ocd : fix names cont : fix PR links macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
33 -
llama.cpp releases dev-tools 1mo ago
b9360
common : fix env names to all have LLAMA_ARG_ prefix ( #23778 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
7 -
llama.cpp releases dev-tools 1mo ago
b9357
vulkan: avoid preferring transfer queue on AMD UMA devices ( #22455 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
38 -
llama.cpp releases dev-tools 1mo ago
b9354
convert: add MiniCPM5 tokenizer support ( #23384 ) Add minicpm5 pre-tokenizer hash via convert_hf_to_gguf_update.py and implement hardcoded regex handling in llama-vocab.cpp, consistent with other BPE pre-tokenizers. Co-authored-by: zhangtao [email protected] macOS/iOS:…
11 -
llama.cpp releases dev-tools 1mo ago
b9353
server : fix the log message when using SSL ( #23393 ) When llama-server is started with SSL key and cert, the log says that it listens on http instead of https. This patch fixes this. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…
7 -
llama.cpp releases dev-tools 1mo ago
b9352
ggml-zendnn : fixed naming of matmul function ( #20964 ) ggml-zendnn: fixed naming of matmul function ggml-zendnn: fixed naming of mul_mat_id function ggml-zendnn: fixed print in mul_mat_id Co-authored-by: plotnikov.v10 [email protected] macOS/iOS: macOS Apple Silicon (arm64)…
4 -
llama.cpp releases dev-tools 1mo ago
b9351
macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Ubuntu x64…
20 -
llama.cpp releases dev-tools 1mo ago
b9334
CUDA: missing PDL sync for FWHT, better fallback ( #23690 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
19 -
llama.cpp releases dev-tools 1mo ago
b9333
metal : add apple device id ( #23566 ) Co-authored-by: lvyichen [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
14 -
llama.cpp releases dev-tools 1mo ago
b9331
ci : reduce PR jobs by matching backend paths ( #23675 ) ci : disable SYCL f16 builds ci : extract android and hip into separate workflows ci : move webgpu to separate workflow ci : move the rpc to a separate workflow ci : extract s309x and ppcl jobs ci : extract opencl job into…
24 -
llama.cpp releases dev-tools 1mo ago
b9330
model: tag ffn_latent as MUL_MAT to fix buft probe ( #23664 ) ffn_latent_down/up are declared GGML_OP_MUL in LLM_TENSOR_INFOS but nemotron-h feeds them through ggml_mul_mat. The loader buft probe asks the backend about the declared op, so it tested an elementwise MUL on a q8_0…
7 -
llama.cpp releases dev-tools 1mo ago
b9329
CUDA: add fast walsh-hadamard transform ( #23615 ) CUDA: add fast walsh-hadamard transform review: add unrolls + change size_t -> int warp size 64 Co-authored-by: Johannes Gäßler [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
11 -
llama.cpp releases dev-tools 1mo ago
b9326
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…
26 -
llama.cpp releases dev-tools 1mo ago
b9320
TP: fix ggml context size calculation ( #22616 ) TP: fix ggml context size calculation, memory leak move split state cache back into the context revert to constant ggml context size for cgraphs increase headroom for statically allocated tensors remove obsolete include macOS/iOS:…
33 -
llama.cpp releases dev-tools 1mo ago
b9319
ggml: gguf_init_from_callback and gguf_init_from_buffer ( #22341 ) ggml: implement gguf_init_from_buffer test: gguf_init_from_buffer fix: memory breakdown for a model loaded with no_alloc from a file is consistent with being loaded from a buffer fix: use GGML_UNUSED…
9 -
llama.cpp releases dev-tools 1mo ago
b9318
server: MTP layer kv-cache should respect draft type ctk ( #23646 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
9 -
llama.cpp releases dev-tools 1mo ago
b9315
llama : document that only one on-device state can be saved per sequence ( #23520 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
13