llama.cpp releases
454 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 11d ago
b9701
mtmd: refactor preprocessor, add mtmd_image_preproc_out ( #24736 ) add mtmd_image_preproc_out add dev docs remove unused clip API rm unused clip_image_f32_batch::grid change preprocess() call signature macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
15 -
llama.cpp releases dev-tools 11d ago
b9700
[SYCL] rename GGML_SYCL_SUPPORT_LEVEL_ZERO ( #24719 ) rename GGML_SYCL_SUPPORT_LEVEL_ZERO to GGML_SYCL_SUPPORT_LEVEL_ZERO_API, and GGML_SYCL_ENABLE_LEVEL_ZERO to GGML_SYCL_USE_LEVEL_ZERO_API fix code format fix error when rebase macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
31 -
llama.cpp releases dev-tools 11d ago
b9699
sycl : support MUL_MAT and OUT_PROD with Q1_0 ( #24721 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
31 -
llama.cpp releases dev-tools 11d ago
b9698
app : enable self-update only when built with llama-install.sh ( #24754 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
34 -
llama.cpp releases dev-tools 11d ago
b9697
ci : fix check-release message parsing ( #24751 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
25 -
llama.cpp releases dev-tools 12d ago
b9694
ci : fix Windows x64 (OpenVINO) release link ( #24731 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
28 -
llama.cpp releases dev-tools 12d ago
b9693
metal : check for BF16 support in concat kernel ( #24747 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
16 -
llama.cpp releases dev-tools 12d ago
b9692
mtmd: llava_uhd should no longer use batch dim ( #24732 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
32 -
llama.cpp releases dev-tools 12d ago
b9691
ggml-cpu: Conditionally enable power11 backend based on compiler support ( #24687 ) ggml: Conditionally enable power11 backend based on compiler support Guard POWER11 backend creation behind a compiler flag check for -mcpu=power11. This avoids build failures on current GCC/Clang…
14 -
llama.cpp releases dev-tools 12d ago
b9690
metal : implement rope_back operator ( #24725 ) Reuse existing rope kernels with a function constant to toggle forward/backward rotation, avoiding duplicate kernel code. Assisted-by: pi:llama.cpp/Qwen3.6-27B macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
27 -
llama.cpp releases dev-tools 12d ago
b9689
metal : add f16 and bf16 support for concat operator ( #24724 ) metal : add f16 and bf16 support for concat operator Extend the Metal backend concat operator to support f16 and bf16 tensor types in addition to the existing f32 and i32 support. Template kernel_concat on type T…
34 -
llama.cpp releases dev-tools 12d ago
b9688
server: (router) add model management API ( #23976 ) wip server: (router) add SSE realtime updates API nits wip add download API add download api update docs add delete endpoint fix std::terminate fix crash fix 2 add tests nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
17 -
llama.cpp releases dev-tools 12d ago
b9687
llama : skip main_gpu validation when no devices are available ( #23405 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
11 -
llama.cpp releases dev-tools 12d ago
b9686
spec: fix segfault error on long prompts for eagle3 ( #24707 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
17 -
llama.cpp releases dev-tools 12d ago
b9685
[SYCL] add dev2dev memcpy by SYCL API ( #24476 ) add dev2dev memcpy by SYCL API mv GGML_SYCL_DEV2DEV_MEMCPY to runntime table update the detect method for p2p comm fix the erro created during fix confilct Co-authored-by: Neo Zhang macOS/iOS: macOS Apple Silicon (arm64) macOS…
33 -
llama.cpp releases dev-tools 12d ago
b9684
[SYCL] Add conv_3d ( #24691 ) add conv_3d optimize update ops.md restore test script rm unused code rm copyright notes macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
15 -
llama.cpp releases dev-tools 12d ago
b9682
vulkan: record actual memory properties during buffer creation ( #24326 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
9 -
llama.cpp releases dev-tools 12d ago
b9678
opencl: optimize mul_mat_f16_f32_l4 for decode ( #24504 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
4 -
llama.cpp releases dev-tools 12d ago
b9677
common: update logging to enforce max_capacity and optimize queue resizing ( #24490 ) common: update logging to enforce max_capacity and optimize queue resizing logic common/log: remove queue expansion logic macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
35 -
llama.cpp releases dev-tools 12d ago
b9675
sycl : Enable to support fp16 by OPs: SQR, SQRT, LOG, SIN, COS, CLAMP ( #24692 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
33 -
llama.cpp releases dev-tools 12d ago
b9674
SYCL: fix use-after-free bug with async memcpy in MoE prefill ( #24676 ) SYCL: fix a bug with async memcpy make mmid_row_mapping_host persistent comment on stream->wait Apply suggestion from @sanmai Apply suggestion from @sanmai Apply suggestion from @sanmai macOS/iOS: macOS…
34 -
llama.cpp releases dev-tools 13d ago
b9673
sycl: Add optional USM system allocations ( #22526 ) This introduces an optional feature to allocate large GPU buffers (≥ 1GB) using USM system allocations if supported by the device. It allows using buffers from the system allocator then letting the system manage memory…
18 -
llama.cpp releases dev-tools 13d ago
b9672
vendor : update BoringSSL to 0.20260616.0 ( #24693 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
31 -
llama.cpp releases dev-tools 13d ago
b9670
Fix and restrict NVFP4 edge-cases in llama-graph ( #24331 ) Move post-GEMM MUL required for dequant b4 lora and bias add see #23484 : For lora, I would presume we want fully dequantized values before doing the residuals, but this depends on how the LORAs were generated.…
26 -
llama.cpp releases dev-tools 13d ago
b9669
spec: add backend sampling support for eagle3 ( #24655 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
27 -
llama.cpp releases dev-tools 13d ago
b9668
vulkan: prefer host-visible memory buffers on UMA devices ( #22930 ) implement UMA host-visible memory update based on 0cc4m's suggestion macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu…
37 -
llama.cpp releases dev-tools 13d ago
b9667
vulkan: Support gated_delta_net with S_v=16 ( #24581 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
38 -
llama.cpp releases dev-tools 13d ago
b9665
bench : add --offline ( #24511 ) bench : add --offline Signed-off-by: Adrien Gallouët [email protected] Add default Signed-off-by: Adrien Gallouët [email protected] Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
29 -
llama.cpp releases dev-tools 14d ago
b9663
[SYCL] Support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND ( #24363 ) support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND fix conflict rebase, support new UT case of repeat, concat macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
17 -
llama.cpp releases dev-tools 14d ago
b9664: sycl: support reordered Q4_K/Q5_K/Q6_K MoE MUL_MAT_ID (#24452)
sycl: support reordered Q4_K and Q5_K MoE MUL_MAT_ID Extend reordered-weight handling to fused MoE MUL_MAT_ID for Q4_K and Q5_K expert tensors and add Q5_K reordered DMMV coverage. Unsupported 3D reorder cases now fall back instead of aborting. sycl: extend MoE reorder to Q6_K…
21 -
llama.cpp releases dev-tools 14d ago
b9661
vulkan: add col2im_1d op ( #24425 ) vulkan: add GGML_OP_COL2IM_1D, follow-up to the CPU op vulkan: col2im_1d bounded gather loop instead of full-K scan with modulo vulkan: col2im_1d address review from @jeffbolznv vulkan: col2im_1d return nullptr for unsupported types, address…
20 -
llama.cpp releases dev-tools 14d ago
b9660
chat : fix LFM2 tool-call parsing double-escaping ( #24667 ) Add escape test cases chat : fix LFM2 tool-call parsing double-escaping macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
20 -
llama.cpp releases dev-tools 14d ago
b9659
mtmd: fix miscounting n_tokens ( #24656 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
20 -
llama.cpp releases dev-tools 14d ago
b9658
chat: include full unparsed prompt in debug ( #24650 ) message on parse error macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
5 -
llama.cpp releases dev-tools 14d ago
b9656
chat: harden peg-native tool call parsing ( #24329 ) chat: harden peg-native tool call parsing accept an optional leading type: function field in build_json_tools_flat_keys so openai style tool calls parse on templates whose serialization opens on the name field. return a clean…
27 -
llama.cpp releases dev-tools 14d ago
b9655
chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes ( #24653 ) chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes update erroneous case in PEG parser test macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
38 -
llama.cpp releases dev-tools 14d ago
b9654
mtmd : add post-decode callback ( #24645 ) Assisted-by: pi:llama.cpp/Qwen3.6-27B macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
14 -
llama.cpp releases dev-tools 14d ago
b9653
vulkan: support more CONCAT types ( #24579 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
35 -
llama.cpp releases dev-tools 14d ago
b9652
wasm : fix fallback symbol collision ( #24639 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
4 -
llama.cpp releases dev-tools 14d ago
b9651
SYCL: use native subgroup size for K-quant DMMV ( #21700 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
15 -
llama.cpp releases dev-tools 14d ago
b9650
sycl: fix soft_max_f32 max reduction ( #24451 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
18 -
llama.cpp releases dev-tools 14d ago
b9649
sycl : fix reorder function; add fp32/fp16 in build script ( #24578 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
31 -
llama.cpp releases dev-tools 14d ago
b9647
[SYCL] add to support pool_1d, move pool_1d/2d code to pool.cpp/hpp ( #24584 ) add to support pool_1d, move pool_1d/2d code to pool.cpp/hpp update ops.md macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…
25 -
llama.cpp releases dev-tools 14d ago
b9646
[SYCL]: Remove per-allocation Level Zero runtime checks ( #23399 ) [SYCL] Centralize Level Zero detection in ggml_sycl_init use the same wording get back the warning [SYCL] Remove per-allocation getenv() for GGML_SYCL_ENABLE_LEVEL_ZERO bring back the comment move it up to make…
10 -
llama.cpp releases dev-tools 14d ago
b9645
metal : add repeat bf16 ( #24638 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
36 -
llama.cpp releases dev-tools 14d ago
b9644
chat: fix whitespace problems once and for all ( #24624 ) chat: fix whitespace problems once and for all Purge trailing spaces from grammar generation Revert "Purge trailing spaces from grammar generation" This reverts commit b0827ec . macOS/iOS: macOS Apple Silicon (arm64)…
30 -
llama.cpp releases dev-tools 14d ago
b9642
CUDA: only support F32/F16 for GGML_OP_REPEAT ( #24533 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
33 -
llama.cpp releases dev-tools 15d ago
b9641
ggml-webgpu: improve i-quants mul_mat performance and speed up prefil…
23 -
llama.cpp releases dev-tools 15d ago
b9637
chat: add dedicated Cohere2MoE (North Code) parser ( #24615 ) chat: add dedicated Cohere2MoE (North Code) parser Some renames to make @CISC happy :> macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…
25