llama.cpp releases
454 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 15d ago
b9632
jinja : add count/d/e filter aliases ( #24606 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
9 -
llama.cpp releases dev-tools 15d ago
b9631
cli : fix not copying preserved tokens ( #24258 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
6 -
llama.cpp releases dev-tools 16d ago
b9630
Add cohere2moe to llama-vocab for TINY_AYA ( #24601 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
16 -
llama.cpp releases dev-tools 16d ago
b9628
add sycl to check-release ( #24583 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
21 -
llama.cpp releases dev-tools 16d ago
b9627
ui : fix llama-ui-embed crash when no asset dir is given ( #24597 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
25 -
llama.cpp releases dev-tools 16d ago
b9626
Add arch support for cohere2-MoE ( #24260 ) Add arch support for cohere2-MoE Removed redundant gating_func checks Changed ffn lookup to prefer prefix_dense_intermediate_size Renamed arch to cohere2moe Removed redundant lmhead check and chat template changes Removed…
36 -
llama.cpp releases dev-tools 16d ago
b9625
jinja : fix negative step slice with start/stop values ( #24580 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
27 -
llama.cpp releases dev-tools 16d ago
b9624
ui: build-time gzip compression ( #24571 ) ui: keep original file name and path fix nocache ui: build-time gzip compression macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
19 -
llama.cpp releases dev-tools 16d ago
b9623
jinja : fix split and replace with empty first arg ( #24574 ) fix split and replace with empty first arg fix reserve size macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
13 -
llama.cpp releases dev-tools 16d ago
b9622
vulkan: support non-contig unary/glu ops ( #24215 ) vulkan: support non-contig unary/glu ops Change unary/glu ops to pass in all strides and use fastdiv for the index calculation. Put all unary ops in one file, similar to glu, to share the code. codex went ahead and added expm1…
15 -
llama.cpp releases dev-tools 16d ago
b9621
ui: keep original file name and path ( #24568 ) ui: keep original file name and path fix nocache macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…
35 -
llama.cpp releases dev-tools 16d ago
b9620
server: clean up static assets handling ( #24550 ) server: clean up static assets handling nits simplify file name handling, use static file name everywhere cmake/ui : bundle UI assets in an archive ui : run prettier on post-build.js Co-authored-by: Alde Rojas [email protected]…
12 -
llama.cpp releases dev-tools 17d ago
b9616
ci : unbreak release harder ( #24545 ) unbreak release harder missed one remove missing test for now macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…
29 -
llama.cpp releases dev-tools 17d ago
b9611
fit : avoid including llama-ext.h in fit.h ( #24506 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
28 -
llama.cpp releases dev-tools 17d ago
b9610
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…
22 -
llama.cpp releases dev-tools 17d ago
b9608
vendor : update cpp-httplib to 0.47.0 ( #24395 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…
13 -
llama.cpp releases dev-tools 17d ago
b9606
spec: add EAGLE3 speculative decoding support ( #18039 ) llama : enable layer input extraction spec: support eagle3 eagle3: fix params bug eagle3: support Gemma4 eagle3 from RedHatAI eagle3: set sync when get features from target Co-authored-by: tnhnyzc…
24 -
llama.cpp releases dev-tools 18d ago
b9605
ggml: support concat for scalar types at cuda backend ( #24011 ) cuda: support concat for scalar types Update concat.cu fix metal ci issue macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…
20 -
llama.cpp releases dev-tools 18d ago
b9604
[SYCL] Fix CI build & release for SYCL backend ( #24387 ) restore SYCL build and release, remove github cache modify for test only verify the ccache is used remove debug code change rm duplicate action, update key in ccache add action ccache-clear after building in both ubuntu…
21 -
llama.cpp releases dev-tools 18d ago
b9603
opencl: add q5_0/q5_1 gemm and gemv kernels for Adreno ( #24319 ) opencl: add q5_0 adreno support opencl: add q5_1 adreno support opencl: cosmetic fix Co-authored-by: Li He [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
13 -
llama.cpp releases dev-tools 18d ago
b9601
vulkan: ifdef eMesaHoneykrisp (build fix) ( #24479 ) Fixes build/CI after #24306 . macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
13 -
llama.cpp releases dev-tools 18d ago
b9596
server: skip unused log lines on router mode ( #24463 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
32 -
llama.cpp releases dev-tools 19d ago
b9594
vocab : refactor normalizer flags into options struct, add strip_accents ( #24371 ) vocab : refactor normalizer flags into options struct, add strip_accents Update src/llama-vocab.h Co-authored-by: Sigbjørn Skjæret [email protected] Update src/llama-vocab.cpp…
27 -
llama.cpp releases dev-tools 19d ago
b9592
vendor : update LibreSSL to 4.3.2 ( #24397 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
35 -
llama.cpp releases dev-tools 19d ago
b9591
Remove padding and multiple D2D copies for MTP ( #24086 ) Make ggml_gated_delta_net take only the initial recurrent state (D, 1, n_seqs) and passes the snapshot count K as an op parameter instead of inferring it from state->ne[1]. Remove the padding hack and copy all emitted…
8 -
llama.cpp releases dev-tools 19d ago
b9590
chat: fix LFM2/LFM2.5 ignoring json_schema ( #24377 ) The LFM2 specialized template handler only built a grammar for tool-calling, silently ignoring json_schema from response_format. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…
6 -
llama.cpp releases dev-tools 20d ago
b9587
speculative : fix "ngram-map-k4v" name in logging ( #24253 ) This is a non-functional change. When using --spec-type ngram-map-k4v , the log messages at startup and runtime say ngram-map-k . Added logic in the in the constructor of common_speculative_impl_ngram_map_k to pass the…
16 -
llama.cpp releases dev-tools 20d ago
b9586: webui: implement pinned conversations support (#21387)
webui: implement pinned conversations support webui: linter/prettier pass Fix the unused handleMobileSidebarItemClick from the component. the search should find pinned conversations as well Co-authored-by: Pascal [email protected] Co-authored-by: Pascal…
24 -
llama.cpp releases dev-tools 20d ago
b9585
graph: Fix granite speech model inference by applying embedding scale when deepstack is not used ( #24357 ) llama-graph : apply embedding scale when deepstack is not used nits: remove non-existant hunyuan-vl from the tests apply suggestion from @gabe-l-hart Co-authored-by: Xuan…
25 -
llama.cpp releases dev-tools 20d ago
b9584
ci : fix windows release ( #24369 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
22 -
llama.cpp releases dev-tools 20d ago
b9581
vulkan: reduce iq1 shared memory usage for mul_mm ( #24287 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
21 -
llama.cpp releases dev-tools 20d ago
b9580
vulkan: add v_dot2_f32_f16 support in matrix-matrix multiplication and Flash Attention ( #24123 ) vulkan: add support for valve fp16 dot2 extension use macro for dot2 path choice properly check for the feature add dot_product abstraction to reduce preprocessor branching…
10 -
llama.cpp releases dev-tools 20d ago
b9578
mtmd: refactor video subproc handling ( #24316 ) mtmd: refactor video subproc handling Update tools/mtmd/mtmd-helper.cpp Co-authored-by: Mikko Juola [email protected] Co-authored-by: Mikko Juola [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
11 -
llama.cpp releases dev-tools 20d ago
b9577
server: log prompts to directory ( #22031 ) server: log prompts to directory Add --log-prompts-dir to write each prompt to a separate text file in the specified directory. Apply suggestion from @ngxson Co-authored-by: Xuan-Son Nguyen [email protected] macOS/iOS: macOS Apple…
35 -
llama.cpp releases dev-tools 20d ago
b9575
ggml : add GGML_OP_COL2IM_1D ( #24206 ) cpu: add GGML_OP_COL2IM_1D Add the overlap-add (scatter-add) step of a 1D transposed convolution. A ConvTranspose1d factorizes as a GEMM followed by col2im: a weight pre-permuted to [IC, K OC] is contracted against the [IC, T_in] input…
4 -
llama.cpp releases dev-tools 20d ago
b9574
server : do not clear slots without unified KV cache ( #24190 ) Always export idle slots to RAM Without this, a slot's VRAM cache may not be written to RAM. If this slot happens to be busy then later on, this triggers needless preprocessing in another slot. cont : clean-up…
33 -
llama.cpp releases dev-tools 20d ago
b9573
models : fix plamo2 attention_key/value_length regression ( #24317 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
15 -
llama.cpp releases dev-tools 21d ago
b9572
ggml-cpu : fix rms_norm_back wrong output under in-place aliasing ( #24305 ) ggml-cpu : fix rms_norm_back wrong output under in-place aliasing cont : clean-up comment Co-authored-by: Georgi Gerganov [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
27 -
llama.cpp releases dev-tools 21d ago
b9571
Remove case for GGML_TYPE_Q4_K in mvvq.cu ( #23528 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
7 -
llama.cpp releases dev-tools 21d ago
b9570
ggml-webgpu: Add clang-format job ( #24308 ) Add clang-format job try local formatting macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
34 -
llama.cpp releases dev-tools 21d ago
b9568
mtp: support for gemma-4 E2B and E4B assistants ( #24282 ) models: update converter to support smaller assistants models: add masked_embd tensors to gemma4-assist arch gemma-4: remove temp debug for conversion gemma-4-mtp: filter out masked_embedding tensors during conversion…
23 -
llama.cpp releases dev-tools 21d ago
b9567
server : do not parse when flushing http headers ( #24281 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
26 -
llama.cpp releases dev-tools 21d ago
b9566
graph: guard iswa kq_mask on its own buffer ( #24294 ) A SWA-only draft head (e.g. StepFun MTP) leaves the base sub-cache empty, so its kq_mask buffer stays null and asserts at load. Guard each mask on its own buffer in set_input and can_reuse, base and swa. Co-authored-by:…
23 -
llama.cpp releases dev-tools 21d ago
b9565
[ggml-webgpu] Handle buffer overlap / buffer aliasing for concat operator ( #24000 ) Only run webgpu CI on my fork Add webgpu only workflow handle buffer overlap case for concat operator restore build-webgpu.yml Co-Authored-By: Claude Sonnet 4.6 [email protected] Run…
14 -
llama.cpp releases dev-tools 21d ago
b9564
[ggml-webgpu] Implement 2D workgroups for scale, binary, and unary ops ( #24044 ) Only run webgpu CI on my fork Add webgpu only workflow Implement 2d workgroups for more operations fix Fix type Move back to global_invocation_id macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
24 -
llama.cpp releases dev-tools 21d ago
b9562
mtmd : add video input support ( #24269 ) wip ok: lazy bitmap API remember to free lazy text wip add mtmd_helper_video support video input on server (base64 input) add MTMD_VIDEO config add timestamp update CLI cli: allow auto-completion for video add --video arg fix build…
22 -
llama.cpp releases dev-tools 21d ago
b9561
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…
13 -
llama.cpp releases dev-tools 21d ago
b9559
cli: fix spinner not show during prompt processing ( #24283 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
10