Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

455 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 21d ago

b9559

cli: fix spinner not show during prompt processing ( #24283 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

10
llama.cpp releases dev-tools 21d ago

b9563

docker: install ffmpeg in the released image ( #24302 )

24
llama.cpp releases dev-tools 21d ago

b9558

vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads ( #23991 ) This allows vec4 loads of the B elements. Also increase BK to 64 when this is enabled. Neither of these alone is consistently faster, but together these give a nice speedup. In ggml-vulkan.cpp, we need to…

28
llama.cpp releases dev-tools 21d ago

b9557

cuda: reset cuda context after reading memory size ( #23935 ) cuda: reset device in get_memory function if no backend is active also count device and host buffers exclude hip and musa from counting and device reset use device mutex instead of atomic undo backend_free function…

34
llama.cpp releases dev-tools 21d ago

b9556

HIP: add gfx1152 and gfx1153 to RDNA3.5 ( #24129 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

10
llama.cpp releases dev-tools 21d ago

b9555

metal : fix im2col 1D case (audio models) ( #24220 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

29
llama.cpp releases dev-tools 22d ago

b9554: [SYCL] Update compute runtime version to 26.x in docker (#24070)

update compute runtime from 25 to 26 in docker add comment with old driver for multiple GPUs

12
llama.cpp releases dev-tools 22d ago

b9553

common : relax sampler name matching ( #23744 ) common : relax sampler name matching Currently, in some cases, the alternative names for samplers (like top-k and min-p instead of the canonical top_k and min_p ) are not always recognized by the common_sampler_types_from_names…

32
llama.cpp releases dev-tools 22d ago

b9551

kv-cache : avoid kv cells copies ( #24277 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

7
llama.cpp releases dev-tools 22d ago

b9550

kv-cache: follow the source cache size when sharing cells ( #24267 ) A fitted target context can end up smaller than the draft default, the oversized assistant views then overflow the shared K/V tensors and trip the ggml_view_4d size assert during graph reserve. macOS/iOS: macOS…

25
llama.cpp releases dev-tools 22d ago

b9549

llama : add Gemma4 MTP ( #23398 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

18
llama.cpp releases dev-tools 22d ago

b9548

spec : fix vocab compatibility check ( #24256 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

14
llama.cpp releases dev-tools 22d ago

b9547

arg: Skip mmproj download when user supplied mmproj ( #24239 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

32
llama.cpp releases dev-tools 23d ago

b9544

common/chat : fix LFM2/LFM2.5 reasoning round-trip and leak ( #24234 ) common/chat : fix LFM2 reasoning round-trip and stray leak Gate by reasoning format and whether the template supports macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

30
llama.cpp releases dev-tools 23d ago

b9543

mtmd: support "frame merge" for qwen-vl-based models ( #21858 ) feat: add video support for Qwen3.5 various clean up revise the design fix llava-uhd case nits nits 2 Co-authored-by: andrewmd5 [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS…

37
llama.cpp releases dev-tools 23d ago

b9542

completion : remove useless statics ( #24226 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

6
llama.cpp releases dev-tools 23d ago

b9541

completion : fix format specifier in LOG_INF ( #24213 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

7
llama.cpp releases dev-tools 24d ago

b9538

model : rename local n_layer_all variable ( #24209 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

15
llama.cpp releases dev-tools 24d ago

b9537

context : fix off-by-one comparisons to n_gpu_layers ( #24208 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

37
llama.cpp releases dev-tools 24d ago

b9536

opencl: improve get_rows, cpy, concat and q6_k flat gemv ( #24160 ) opencl: allow multiple workgroups for large rows opencl: improve small cpy opencl: packed concat for small input opencl: tweak flat q6_K gemv, increase N_DST and remap threads macOS/iOS: macOS Apple Silicon…

27
llama.cpp releases dev-tools 24d ago

b9535

common/chat : unify and fix LFM2/LFM2.5 tool parser ( #24178 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

19
llama.cpp releases dev-tools 24d ago

b9534

vulkan: add fwht support for Intel with shmem reduction ( #23964 ) vulkan: add fwht support for Intel with shmem reduction don't use N as workgroup size disable subgroup shuffle on MoltenVK AMD disable fwht shader on Intel Windows due to driver bug macOS/iOS: macOS Apple Silicon…

21
llama.cpp releases dev-tools 24d ago

b9533

model: fix build failed ( #24193 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

28
llama.cpp releases dev-tools 24d ago

b9531

TP: round up granularity to 128 ( #24180 ) TP: round up granularity to 128 remove assert macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

25
llama.cpp releases dev-tools 24d ago

b9530

cli: fix model params not propagated ( #23893 ) Fixes #23847 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

21
llama.cpp releases dev-tools 24d ago

b9529

model : fix llama_model::n_gpu_layers() ( #24188 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

36
llama.cpp releases dev-tools 24d ago

b9528

ui: run npm install when package-lock.json is newer than node_modules ( #24171 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

17
llama.cpp releases dev-tools 25d ago

b9524

minor : fix lint issues ( #24165 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

17
llama.cpp releases dev-tools 25d ago

b9523

hparams : refactor hparams.n_layer ( #24060 ) hparams : refactor hparams.n_layer cont : remove n_layer_kv() , use n_layer_all instead cont : type consistency pi : update SYSTEM.md models : fix Step3.5 MTP cont : remove duplicate switch cases cont : explicitly set false to extra…

30
llama.cpp releases dev-tools 25d ago

b9522

kleidiai : dynamic chunck-based scheduling for hybrid execution ( #23819 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

16
llama.cpp releases dev-tools 25d ago

b9521

CUDA: enroll mul_mat_vec_q_moe into pdl ( #24087 ) Enroll mul_mat_vec_q_moe into PDL, boosting MTP performance on BW Data collected on a B4500: Before (llama.cpp) ➜ llama.cpp git:(master) ✗ python mtp-bench.py code_python pred= 192 draft= 150 acc= 116 rate=0.773 tok/s=202.8…

10
llama.cpp releases dev-tools 25d ago

b9519

sycl : port multi-column MMVQ from CUDA backend ( #21845 ) mmvq: Port the ncols_dst optimization from ggml-cuda/mmvq.cu to SYCL. Read weights once per dispatch instead of once per column. Covers all standard quant types + reorder paths for Q4_0, Q8_0, Q3_K, Q4_K, Q5_K, Q6_K. IQ…

4
llama.cpp releases dev-tools 25d ago

b9518

server : disable on-device spec checkpoints ( #24108 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

15
llama.cpp releases dev-tools 25d ago

b9515

Move duplicated imatrix code into single common imatrix-loader.cpp ( #22445 ) Deduplicate imatrix loading code Add back LLAMA_TRACE, early exit on quantize missing metadata macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel…

26
llama.cpp releases dev-tools 25d ago

b9512

return filter to save memory ( #24125 ) Co-authored-by: lvyichen [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

25
llama.cpp releases dev-tools 25d ago

b9510

ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 ( #22209 ) ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 Optimize the inner loop of ggml_vec_dot_q4_1_q8_1_generic using WASM SIMD128 intrinsics, gated behind #ifdef wasm_simd128 so non-wasm builds are…

11
llama.cpp releases dev-tools 25d ago

b9509

server: avoid unnecessary checkpoint restore when new tokens are present ( #24110 ) server: avoid unnecessary checkpoint restore when new tokens are present The pos_min_thold calculation unconditionally subtracts 1 to ensure at least one token is evaluated for logits when no new…

21
llama.cpp releases dev-tools 25d ago

b9505

server : add header to tools/server/server-http.h ( #24089 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

29
llama.cpp releases dev-tools 25d ago

b9504

cmake: skip cvector-generator and export-lora when CPU backend is disabled ( #24053 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

4
llama.cpp releases dev-tools 25d ago

b9503

fix(mtmd): handle Gemma 4 audio projector embedding size ( #24091 ) mtmd: handle Gemma 4 audio projector embedding size rm projection_dim from clip_n_mmproj_embd Co-authored-by: Xuan Son Nguyen [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

28
llama.cpp releases dev-tools 26d ago

b9500

metal : reduce rset heartbeat from 500ms -> 5ms ( #24074 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

37
llama.cpp releases dev-tools 26d ago

b9499

ggml-webgpu: FlashAttention refactor + standardize quantization support ( #23834 ) Start work on flash_attn refactor Refactor Split k/v quantization Refactor and abstract quantization logic for flash_attn and mul_mat Add quantization support to tile path formatting Move to…

23
llama.cpp releases dev-tools 26d ago

b9498

ggml-cpu: extend RVV quantization vec dot to higher VLENs ( #22754 ) ggml-cpu: add rvv 512b,1024b impls for iq4_xs ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants ggml-cpu: refactor; add 512 and 1024 implementations of tq3_s, iq3_xxs, iq2_s, iq2_xs, iq2_xxs…

22
llama.cpp releases dev-tools 26d ago

b9501: tests : refactor test-save-load-state to accept token input (#24073)

tests : refactor test-save-load-state to accept token input Default prompt is now empty; when not provided, generate n_batch random tokens (useful for models without a tokenizer) Tokenization happens once upfront; pass token vector to test functions generate_tokens prints token…

26
llama.cpp releases dev-tools 26d ago

b9496

mtmd: fix Gemma 4 unified FPE ( #24088 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

27
llama.cpp releases dev-tools 26d ago

b9495

qwen35: use post-norm hidden state for MTP ( #24025 ) qwen35: use post-norm hidden state for MTP rename pre_norm to nextn fix step35 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…

37
llama.cpp releases dev-tools 26d ago

b9494

mtmd: enable non-causal vision for gemma 4 unified ( #24082 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

22
llama.cpp releases dev-tools 26d ago

b9493

mtmd, model: allow skip build_vit() ( #24077 ) add model nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

26
llama.cpp releases dev-tools 26d ago

b9491

Avoid PDL race conditions by disabling restrict when PDL is used ( #24030 ) Removes restrict from PDL kernel headers due to incompatibility with PDL. Adds preprocessor directives based on arch in kernel body to add restrict to retain performance on older architectures.…

9
llama.cpp releases dev-tools 26d ago

b9490

ggml-cpu: use runtime SVE width in FWHT ( #24059 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

32

Page 5 of 10 · 455 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *