llama.cpp releases
455 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 21d ago
b9559
cli: fix spinner not show during prompt processing ( #24283 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
10 -
llama.cpp releases dev-tools 21d ago
b9563
docker: install ffmpeg in the released image ( #24302 )
24 -
llama.cpp releases dev-tools 21d ago
b9558
vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads ( #23991 ) This allows vec4 loads of the B elements. Also increase BK to 64 when this is enabled. Neither of these alone is consistently faster, but together these give a nice speedup. In ggml-vulkan.cpp, we need to…
28 -
llama.cpp releases dev-tools 21d ago
b9557
cuda: reset cuda context after reading memory size ( #23935 ) cuda: reset device in get_memory function if no backend is active also count device and host buffers exclude hip and musa from counting and device reset use device mutex instead of atomic undo backend_free function…
34 -
llama.cpp releases dev-tools 21d ago
b9556
HIP: add gfx1152 and gfx1153 to RDNA3.5 ( #24129 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
10 -
llama.cpp releases dev-tools 21d ago
b9555
metal : fix im2col 1D case (audio models) ( #24220 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
29 -
llama.cpp releases dev-tools 22d ago
b9553
common : relax sampler name matching ( #23744 ) common : relax sampler name matching Currently, in some cases, the alternative names for samplers (like top-k and min-p instead of the canonical top_k and min_p ) are not always recognized by the common_sampler_types_from_names…
32 -
llama.cpp releases dev-tools 22d ago
b9551
kv-cache : avoid kv cells copies ( #24277 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
7 -
llama.cpp releases dev-tools 22d ago
b9550
kv-cache: follow the source cache size when sharing cells ( #24267 ) A fitted target context can end up smaller than the draft default, the oversized assistant views then overflow the shared K/V tensors and trip the ggml_view_4d size assert during graph reserve. macOS/iOS: macOS…
25 -
llama.cpp releases dev-tools 22d ago
b9549
llama : add Gemma4 MTP ( #23398 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
18 -
llama.cpp releases dev-tools 22d ago
b9548
spec : fix vocab compatibility check ( #24256 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
14 -
llama.cpp releases dev-tools 22d ago
b9547
arg: Skip mmproj download when user supplied mmproj ( #24239 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
32 -
llama.cpp releases dev-tools 23d ago
b9544
common/chat : fix LFM2/LFM2.5 reasoning round-trip and leak ( #24234 ) common/chat : fix LFM2 reasoning round-trip and stray leak Gate by reasoning format and whether the template supports macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
30 -
llama.cpp releases dev-tools 23d ago
b9543
mtmd: support "frame merge" for qwen-vl-based models ( #21858 ) feat: add video support for Qwen3.5 various clean up revise the design fix llava-uhd case nits nits 2 Co-authored-by: andrewmd5 [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS…
37 -
llama.cpp releases dev-tools 23d ago
b9542
completion : remove useless statics ( #24226 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…
6 -
llama.cpp releases dev-tools 23d ago
b9541
completion : fix format specifier in LOG_INF ( #24213 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
7 -
llama.cpp releases dev-tools 24d ago
b9538
model : rename local n_layer_all variable ( #24209 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
15 -
llama.cpp releases dev-tools 24d ago
b9537
context : fix off-by-one comparisons to n_gpu_layers ( #24208 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
37 -
llama.cpp releases dev-tools 24d ago
b9536
opencl: improve get_rows, cpy, concat and q6_k flat gemv ( #24160 ) opencl: allow multiple workgroups for large rows opencl: improve small cpy opencl: packed concat for small input opencl: tweak flat q6_K gemv, increase N_DST and remap threads macOS/iOS: macOS Apple Silicon…
27 -
llama.cpp releases dev-tools 24d ago
b9535
common/chat : unify and fix LFM2/LFM2.5 tool parser ( #24178 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
19 -
llama.cpp releases dev-tools 24d ago
b9534
vulkan: add fwht support for Intel with shmem reduction ( #23964 ) vulkan: add fwht support for Intel with shmem reduction don't use N as workgroup size disable subgroup shuffle on MoltenVK AMD disable fwht shader on Intel Windows due to driver bug macOS/iOS: macOS Apple Silicon…
21 -
llama.cpp releases dev-tools 24d ago
b9533
model: fix build failed ( #24193 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
28 -
llama.cpp releases dev-tools 24d ago
b9531
TP: round up granularity to 128 ( #24180 ) TP: round up granularity to 128 remove assert macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
25 -
llama.cpp releases dev-tools 24d ago
b9530
cli: fix model params not propagated ( #23893 ) Fixes #23847 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
21 -
llama.cpp releases dev-tools 24d ago
b9529
model : fix llama_model::n_gpu_layers() ( #24188 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
36 -
llama.cpp releases dev-tools 24d ago
b9528
ui: run npm install when package-lock.json is newer than node_modules ( #24171 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
17 -
llama.cpp releases dev-tools 25d ago
b9524
minor : fix lint issues ( #24165 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
17 -
llama.cpp releases dev-tools 25d ago
b9523
hparams : refactor hparams.n_layer ( #24060 ) hparams : refactor hparams.n_layer cont : remove n_layer_kv() , use n_layer_all instead cont : type consistency pi : update SYSTEM.md models : fix Step3.5 MTP cont : remove duplicate switch cases cont : explicitly set false to extra…
30 -
llama.cpp releases dev-tools 25d ago
b9522
kleidiai : dynamic chunck-based scheduling for hybrid execution ( #23819 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
16 -
llama.cpp releases dev-tools 25d ago
b9521
CUDA: enroll mul_mat_vec_q_moe into pdl ( #24087 ) Enroll mul_mat_vec_q_moe into PDL, boosting MTP performance on BW Data collected on a B4500: Before (llama.cpp) ➜ llama.cpp git:(master) ✗ python mtp-bench.py code_python pred= 192 draft= 150 acc= 116 rate=0.773 tok/s=202.8…
10 -
llama.cpp releases dev-tools 25d ago
b9519
sycl : port multi-column MMVQ from CUDA backend ( #21845 ) mmvq: Port the ncols_dst optimization from ggml-cuda/mmvq.cu to SYCL. Read weights once per dispatch instead of once per column. Covers all standard quant types + reorder paths for Q4_0, Q8_0, Q3_K, Q4_K, Q5_K, Q6_K. IQ…
4 -
llama.cpp releases dev-tools 25d ago
b9518
server : disable on-device spec checkpoints ( #24108 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
15 -
llama.cpp releases dev-tools 25d ago
b9515
Move duplicated imatrix code into single common imatrix-loader.cpp ( #22445 ) Deduplicate imatrix loading code Add back LLAMA_TRACE, early exit on quantize missing metadata macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel…
26 -
llama.cpp releases dev-tools 25d ago
b9512
return filter to save memory ( #24125 ) Co-authored-by: lvyichen [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
25 -
llama.cpp releases dev-tools 25d ago
b9510
ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 ( #22209 ) ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 Optimize the inner loop of ggml_vec_dot_q4_1_q8_1_generic using WASM SIMD128 intrinsics, gated behind #ifdef wasm_simd128 so non-wasm builds are…
11 -
llama.cpp releases dev-tools 25d ago
b9509
server: avoid unnecessary checkpoint restore when new tokens are present ( #24110 ) server: avoid unnecessary checkpoint restore when new tokens are present The pos_min_thold calculation unconditionally subtracts 1 to ensure at least one token is evaluated for logits when no new…
21 -
llama.cpp releases dev-tools 25d ago
b9505
server : add header to tools/server/server-http.h ( #24089 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
29 -
llama.cpp releases dev-tools 25d ago
b9504
cmake: skip cvector-generator and export-lora when CPU backend is disabled ( #24053 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
4 -
llama.cpp releases dev-tools 25d ago
b9503
fix(mtmd): handle Gemma 4 audio projector embedding size ( #24091 ) mtmd: handle Gemma 4 audio projector embedding size rm projection_dim from clip_n_mmproj_embd Co-authored-by: Xuan Son Nguyen [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
28 -
llama.cpp releases dev-tools 26d ago
b9500
metal : reduce rset heartbeat from 500ms -> 5ms ( #24074 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
37 -
llama.cpp releases dev-tools 26d ago
b9499
ggml-webgpu: FlashAttention refactor + standardize quantization support ( #23834 ) Start work on flash_attn refactor Refactor Split k/v quantization Refactor and abstract quantization logic for flash_attn and mul_mat Add quantization support to tile path formatting Move to…
23 -
llama.cpp releases dev-tools 26d ago
b9498
ggml-cpu: extend RVV quantization vec dot to higher VLENs ( #22754 ) ggml-cpu: add rvv 512b,1024b impls for iq4_xs ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants ggml-cpu: refactor; add 512 and 1024 implementations of tq3_s, iq3_xxs, iq2_s, iq2_xs, iq2_xxs…
22 -
llama.cpp releases dev-tools 26d ago
b9501: tests : refactor test-save-load-state to accept token input (#24073)
tests : refactor test-save-load-state to accept token input Default prompt is now empty; when not provided, generate n_batch random tokens (useful for models without a tokenizer) Tokenization happens once upfront; pass token vector to test functions generate_tokens prints token…
26 -
llama.cpp releases dev-tools 26d ago
b9496
mtmd: fix Gemma 4 unified FPE ( #24088 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
27 -
llama.cpp releases dev-tools 26d ago
b9495
qwen35: use post-norm hidden state for MTP ( #24025 ) qwen35: use post-norm hidden state for MTP rename pre_norm to nextn fix step35 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
37 -
llama.cpp releases dev-tools 26d ago
b9494
mtmd: enable non-causal vision for gemma 4 unified ( #24082 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
22 -
llama.cpp releases dev-tools 26d ago
b9493
mtmd, model: allow skip build_vit() ( #24077 ) add model nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
26 -
llama.cpp releases dev-tools 26d ago
b9491
Avoid PDL race conditions by disabling restrict when PDL is used ( #24030 ) Removes restrict from PDL kernel headers due to incompatibility with PDL. Adds preprocessor directives based on arch in kernel body to add restrict to retain performance on older architectures.…
9 -
llama.cpp releases dev-tools 26d ago
b9490
ggml-cpu: use runtime SVE width in FWHT ( #24059 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
32