llama.cpp releases
454 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 5h ago
b9843
Revert "sched : reintroduce less synchronizations during split compute ( #20793 )" ( #25138 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
33 -
llama.cpp releases dev-tools 14h ago
b9842
common : dedup preset and cached model entries in /v1/models ( #25131 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
29 -
llama.cpp releases dev-tools 20h ago
b9840
DeepSeek V4 ( #24162 ) convert: add dsv4 conversion add basic setup add llm_graph_input_dsv4 add save-load state add sinkhorn eps - correction by @fairydreaming add rope fix cleanup dead code fix bugs support pro model: added by @fairydreaming remove redundant V cache Chat…
26 -
llama.cpp releases dev-tools 21h ago
b9839
tools/ui: restore Tailwind scanning in ignored worktrees ( #24879 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
24 -
llama.cpp releases dev-tools 23h ago
b9838
common : remove unused regex-partial ( #25118 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
24 -
llama.cpp releases dev-tools 1d ago
b9837
jinja, chat: add --reasoning-preserve flag ( #25105 ) jinja, chat: add --reasoning-preserve flag correct help message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
28 -
llama.cpp releases dev-tools 1d ago
b9835
ui: fix stop and reasoning skip in single-model mode ( #25084 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
15 -
llama.cpp releases dev-tools 1d ago
b9833
chat : implement minicpm5 parser ( #24889 ) Add minicpm5 tool call parser Refactor MiniCPM5 PEG parser per review feedback Fix jinja min/max API to match Jinja2 modify by review MiniCPM5: use autoparser for XML tool calls and fix grammar preserved-token triggers MiniCPM5: fix…
26 -
llama.cpp releases dev-tools 1d ago
b9831
spec : add DFlash support ( #22105 ) spec: add DFlash v2 support dflash: support sliding window attention per layer_types docs: add dflash section Co-authored-by: Kashif Rasul [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
12 -
llama.cpp releases dev-tools 1d ago
b9830
common : allow --offline in llama download ( #25091 ) Expose the existing --offline flag to llama download so a script can run it to check whether a model is already cached and ready to be served without touching the network. Also fix a latent use-after-free in the URL-task…
4 -
llama.cpp releases dev-tools 2d ago
b9829
logs : reduce v2 ( #25078 ) server : reduce logs cont : common cont : spec cont : CMN_ -> COM_ macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
11 -
llama.cpp releases dev-tools 2d ago
b9828
opencl: flash attention improvement ( #25069 ) opencl: rework FA kernel for f16 and f32 opencl: flash-attention prefill prepass kernels flash_attn_kv_pad_f16 pads the tail KV tile to a BLOCK_N multiple flash_attn_mask_pad_f16 pads the matching mask tile flash_attn_blk_f16…
13 -
llama.cpp releases dev-tools 2d ago
b9827
[CUDA] Added a cudaMemcpy2DAsync fast path to ggml_cuda_cpy ( #25057 ) [CUDA] Added a cudaMemcpy2DAsync fast path to ggml_cuda_cpy Add a CUDA ggml_cpy fast path for same-type, same-shape strided copies that are just 2D pitched block copies. When tensors are not fully contiguous…
14 -
llama.cpp releases dev-tools 2d ago
b9826
sycl : fix failed ut cases of norm ( #25044 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
13 -
llama.cpp releases dev-tools 2d ago
b9825
vulkan: fix step operator for 0 input ( #25036 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
9 -
llama.cpp releases dev-tools 2d ago
b9824
binaries : Improve rpc-server and export-graph-ops names. ( #25045 ) Tests are generally prefixed with -test, so rename export-graph-ops accordingly. rpc-server is probably too generic a name for /usr/bin. Because it should work with any ggml application, it is renamed to…
20 -
llama.cpp releases dev-tools 2d ago
b9823
ci : add windows-openvino to check-release ( #25022 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
11 -
llama.cpp releases dev-tools 2d ago
b9822
tests : fix test-chat-template --no-common option ( #25075 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
24 -
llama.cpp releases dev-tools 3d ago
b9821
app : allow --version, --licenses & --help ( #25054 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…
22 -
llama.cpp releases dev-tools 3d ago
b9820
sched : reintroduce less synchronizations during split compute ( #20793 ) CUDA: Improve performance via less synchronizations between token ( #17795 ) Adds CPU-to-CUDA copy capability to ggml_backend_cuda_cpy_tensor_async() Adds function to relax sync requirements between input…
26 -
llama.cpp releases dev-tools 3d ago
b9817
openvino: Update to OV 2026.2.1, self-contained release packages, operator improvements ( #24974 ) Update to OV 2026.2.1, Make OV release packages self-contained Update to OV 2026.2.1, Make OV release packages self-contained OpenVINO Backend: Remove compute_op_type hardcoded…
23 -
llama.cpp releases dev-tools 3d ago
b9816
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…
35 -
llama.cpp releases dev-tools 3d ago
b9814
vulkan: opt mul_mat_vecq for mi50 ( #22933 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
29 -
llama.cpp releases dev-tools 3d ago
b9813
vulkan: add INTEL_XE1 arch enum and enable coopmat1 on Intel Xe-LPG Plus ( #24404 ) vulkan: add INTEL_PRE_XE2 arch enum and enable coopmat1 on Intel Xe-LPG Plus (1/3, Xe1-ARLH) Co-authored-by: Xia, Jie [email protected] Co-authored-by: Liu, Russell [email protected] Address…
23 -
llama.cpp releases dev-tools 3d ago
b9811
vulkan: Workaround compiler bug in conv2d coopmat2 path ( #24924 ) vulkan: Workaround compiler bug in conv2d coopmat2 path apply same workaround to CONV_3D Apply suggestion from @jeffbolznv macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
38 -
llama.cpp releases dev-tools 3d ago
b9810
CUDA: add cublasSgemmBatched mapping for HIP/MUSA vendor headers ( #25033 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
31 -
llama.cpp releases dev-tools 3d ago
b9804
mamba2: remove hardcoded 2x expansion factor and invalid d_inner % d_state check ( #23082 ) mamba2: remove hardcoded 2x expansion factor, support any expand value mamba2: remove invalid d_inner %% d_state check (unrelated parameters) Update convert_hf_to_gguf.py: make expand…
21 -
llama.cpp releases dev-tools 4d ago
b9803
opencl: flush profiling batch at shutdown for incomplete batches ( #25016 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
7 -
llama.cpp releases dev-tools 4d ago
b9802
macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…
15 -
llama.cpp releases dev-tools 4d ago
b9789
quant : fix quantizing moe with mtp ( #24986 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
5 -
llama.cpp releases dev-tools 4d ago
b9788
sycl : support --split-mode tensor ( #24152 ) Sycl tp stage1 ( #1 ) SYCL: tensor parallelism (--split-mode tensor) for dual-GPU Adds the comm_init/comm_free/comm_allreduce_tensor trio that the meta-backend queries via get_proc_address to enable backend-specific all-reduce,…
33 -
llama.cpp releases dev-tools 5d ago
b9786
opencl: support non-contig rows in norm ( #24965 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
8 -
llama.cpp releases dev-tools 5d ago
b9785
chat: harden caps check ( #24973 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
23 -
llama.cpp releases dev-tools 5d ago
b9784
hexagon: MUL_MAT and MUL_MAT_ID rework : 32x32 tiled weight repack, kernel-params, cached graphs ( #24954 ) hex-mm: new weight layout and fusion updates hvx-mm: unroll the new tiled vec_dots to optimize hvx register util hex-mm: optimize dyn.quant format for q8_0 and q8_1 to…
36 -
llama.cpp releases dev-tools 5d ago
b9782
common: remove unused json-partial ( #24968 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
5 -
llama.cpp releases dev-tools 5d ago
b9781
vulkan: allow reducing the graph submission batches to avoid timeouts ( #24872 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
7 -
llama.cpp releases dev-tools 5d ago
b9780
vulkan: fail the build when a shader fails to compile ( #24450 ) vulkan-shaders-gen: fail the build when a shader fails to compile vulkan-shaders-gen did not detect shader-compile subprocess failures, so a broken libggml-vulkan could be produced while the build reported success…
18 -
llama.cpp releases dev-tools 5d ago
b9777
model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M ( #24913 ) model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M Restore LFM2 models in README.md macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…
7 -
llama.cpp releases dev-tools 6d ago
b9776
vulkan: Apply bias before softmax in FA, to avoid overflow ( #24909 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
26 -
llama.cpp releases dev-tools 6d ago
b9775
server : check draft context creation error ( #24922 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
13 -
llama.cpp releases dev-tools 6d ago
b9774
vulkan: support all backend tests for SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU/NORM ( #24582 ) vulkan: make SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU use unary.comp vulkan: make NORM support noncontig add noncontiguous row test cases for norm/l2_norm, handle this in the CPU backend and…
31 -
llama.cpp releases dev-tools 6d ago
b9773
vulkan: Support GET_ROWS_BACK ( #24883 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
23 -
llama.cpp releases dev-tools 6d ago
b9771
vulkan: make mul_mm ALIGNED a spec constant ( #24689 ) This trims down some of the shader variant explosion and reduces binary size. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
26 -
llama.cpp releases dev-tools 6d ago
b9770
server: fix remote preset handling, add test ( #24938 ) server: add test for remote preset fix remote preset handling fix fix test macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
20 -
llama.cpp releases dev-tools 6d ago
b9769
vulkan: link ggml-cpu when GGML_VULKAN_CHECK_RESULTS / RUN_TESTS are enabled ( #24444 ) The result-checking and test debug paths in ggml-vulkan.cpp call ggml_graph_compute_with_ctx() to compute a CPU reference graph, but that symbol is defined in ggml-cpu, which ggml-vulkan does…
37 -
llama.cpp releases dev-tools 6d ago
b9768
model: Granite Speech Plus ( #24818 ) feat: Add conversion support for Granite Speech Plus Branch: GraniteSpeechPlus AI-usage: full (Bob, OpenCode + Qwen3.6-35b) Signed-off-by: Gabe Goodhart [email protected] feat: Extend granite_speech to support plus multi-layer concatenation…
27 -
llama.cpp releases dev-tools 6d ago
b9767
ggml-webgpu: improve MTP inference by using mat-vec path for small batches ( #24811 ) ggml-webgpu: improve small batches decoding Add barrier to the NUM_COLS loop in mul-mat-vec macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…
21 -
llama.cpp releases dev-tools 7d ago
b9765
server: improve user message detection and create checkpoints at ever…
20