llama.cpp releases
456 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 1mo ago
b9222
hexagon: add support for TRI op ( #22822 ) Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context addressed PR review comments for TRI op hexagon: clang format hex-unary: remove merge conflict markers hex-ggml: remove duplicate op cases (merge conflict) hex-ggml:…
36 -
llama.cpp releases dev-tools 1mo ago
b9221
ggml-hexagon: add PAD op HVX kernel ( #23078 ) ggml-hexagon: add PAD op HVX kernel Implements GGML_OP_PAD on the Hexagon HTP backend using HVX vectorized kernels. Supports zero-padding and circular padding across all 4 tensor dimensions. hex-ggml: remove duplicate op cases…
26 -
llama.cpp releases dev-tools 1mo ago
b9219
common : remove hf cache migration ( #23266 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
20 -
llama.cpp releases dev-tools 1mo ago
b9216
ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG ( #23236 ) refactor: Scope console logs to DEV + VITE_DEBUG env vars refactor: skip MCP proxy probe when no server requires it refactor: suppress expected disconnect errors during MCP client shutdown…
33 -
llama.cpp releases dev-tools 1mo ago
b9213
llama: initialize pre-norm embedding mask flag ( #23256 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
21 -
llama.cpp releases dev-tools 1mo ago
b9208
sycl: route small f32 matmuls to oneMKL, bypass oneDNN ( #22150 ) Signed-off-by: Chun Tao [email protected] Co-authored-by: Chun Tao [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…
18 -
llama.cpp releases dev-tools 1mo ago
b9209: sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (#22156)
Signed-off-by: Chun Tao [email protected] Co-authored-by: Chun Tao [email protected]
11 -
llama.cpp releases dev-tools 1mo ago
b9204
feat: Support d_conv=15 for ssm-conv.cu ( #23017 ) Branch: ModalityConditionalAdapters AI-usage: none Signed-off-by: Gabe Goodhart [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…
21 -
llama.cpp releases dev-tools 1mo ago
b9203
cmake : fix LLAMA_BUILD_UI logic ( #23190 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
4 -
llama.cpp releases dev-tools 1mo ago
b9202
cmake : do not install conversion script ( #23204 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
5 -
llama.cpp releases dev-tools 1mo ago
b9200
llama: avoid copying logits during prompt decode in MTP ( #23198 ) llama: avoid copying logits during prompt decode in MTP review: update comment llama-graph: call set_output for t_h_pre_norm macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
10 -
llama.cpp releases dev-tools 1mo ago
b9198
ggml-vulkan/CMakeLists: add a check for SPIRV-Headers ( #22009 ) ci/run: set explicit SPIR-V Headers search path for macOS vulkan CI For whatever reason, the files are under additional sub-path vulkan/ under the cmake directory, which does not match either current LunarG macOS…
8 -
llama.cpp releases dev-tools 1mo ago
b9197
vulkan: add cpy bf16 -> f32 pipelines ( #22677 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
23 -
llama.cpp releases dev-tools 1mo ago
b9196
vulkan: Support unaligned tensors for ROPE ( #22637 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
36 -
llama.cpp releases dev-tools 1mo ago
b9194
vulkan: fuse SSM_CONV + BIAS + SILU ( #22653 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
34 -
llama.cpp releases dev-tools 1mo ago
b9193
server : honor --embd-normalize CLI arg ( #23125 ) The --embd-normalize flag was registered only for the embedding and debug examples, so llama-server rejected it and the /embedding handler used a hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's example set…
7 -
llama.cpp releases dev-tools 1mo ago
b9192
ngram : reduce noisy logs ( #23185 ) ngram : reduce noisy logs ngram : reduce noisy logs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…
19 -
llama.cpp releases dev-tools 1mo ago
b9190
server: (router) alloc tmp buffer on heap ( #23159 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
16 -
llama.cpp releases dev-tools 1mo ago
b9189
server: skip device enumeration in router mode to avoid creating CUDA primary context ( #23137 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…
7 -
llama.cpp releases dev-tools 1mo ago
b9186
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…
15 -
llama.cpp releases dev-tools 1mo ago
b9181
vendor : update cpp-httplib to 0.45.0 ( #23103 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
12 -
llama.cpp releases dev-tools 1mo ago
b9180
llama + spec: MTP Support ( #22673 ) spec: support MTP fix batch size rename files cont : simplify ( #7 ) MTP: clean-up ( #9 ) MTP: clean-up review: use llama_context_type instead of llama_graph_type review: remove llama_model_has_mtp review: fix convert issues convert: fix…
37 -
llama.cpp releases dev-tools 1mo ago
b9174
ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming ( #23064 ) webui: Move static build output from tools/server/public to build/ui directory refactor: Move to tools/ui refactor: rename CMake variables and preprocessor defines Rename…
36 -
llama.cpp releases dev-tools 1mo ago
b9173
ci : fix release symlinks ( #23119 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm…
33 -
llama.cpp releases dev-tools 1mo ago
b9172
webui: Use lowercase hash for HF checksum check ( #23107 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
24 -
llama.cpp releases dev-tools 1mo ago
b9169
mtmd: add chunks and fix preproc for qwen3a ( #23073 ) mtmd: add chunks and fix preproc for qwen3a add attn_mask limit mtmd_chunk size (avoid blow up memory) correct audio tokens re-order the set_input case remove attn_mask macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…
7 -
llama.cpp releases dev-tools 1mo ago
b9165
ci : fix transform of top . entry in release archive ( #23080 ) fix transform of top . entry in release archive simplify macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…
11 -
llama.cpp releases dev-tools 1mo ago
b9163
reasoning-budget: clone should do a deep-copy ( #23095 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
36 -
llama.cpp releases dev-tools 1mo ago
b9161
Support for Codex CLI by skipping unsupported Responses tools ( #23041 ) Support for Codex CLI by skipping unsupported Responses tools Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection Revert gpt-oss apply_patch special handling macOS/iOS: macOS Apple…
29 -
llama.cpp releases dev-tools 1mo ago
b9159
ggml-hexagon: cpy: add contiguous fast-path in reshape copy ( #23076 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
4 -
llama.cpp releases dev-tools 1mo ago
b9158
HIP: RDNA3 mma FA, faster AMD transpose, tune AMD ( #22880 ) Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80…
25 -
llama.cpp releases dev-tools 1mo ago
b9156
ggml-webgpu: Enable NVIDIA self-hosted CI ( #22976 ) Enabel nvidia ci for webgpu Address precision issues fix placement Relax more set_rows and div Try relaxing all f16 formatting and naming Add comment explaining max_nmse_err logic Added comment referencing pull request for…
21 -
llama.cpp releases dev-tools 1mo ago
b9151
logs : reduce ( #23021 ) logs : reduce args : fix envs server : fix build common : print verbosity level at start server : clean-up logs server : print prompt processing timings + sampling params minor : whitespaces macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
8 -
llama.cpp releases dev-tools 1mo ago
b9150
ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend ( #22863 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
20 -
llama.cpp releases dev-tools 1mo ago
b9148
unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… ( #22110 ) unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regression tests Add unicode_regex_split_custom_qwen35() to src/unicode.cpp , a non-backtracking handler for Qwen3.5's [\p{L}\p{M}]+…
18 -
llama.cpp releases dev-tools 1mo ago
b9145
SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations ( #21597 ) SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations Replace sycl::malloc_device with zeMemAllocDevice for GPU memory allocation in the SYCL backend. sycl::malloc_device…
6 -
llama.cpp releases dev-tools 1mo ago
b9144
ggml-webgpu: only use subgroup-matrix path when head dims are divisible by sg_mat_k / sg_mat_n ( #23020 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…
7 -
llama.cpp releases dev-tools 1mo ago
b9143
Fix for issue #22974 . Cast intermediate results to float before adding and casting the result to the destination type. Avoids half+half operator ambiguity. ( #22994 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS…
33 -
llama.cpp releases dev-tools 1mo ago
b9142
opencl: add q5_0 and q5_1 MoE for Adreno ( #22985 ) opencl: add q5_0 moe support opencl: add q5_1 moe support opencl: avoid potential leak opencl: suppress unused var warning when building for non-Adreno Co-authored-by: Li He [email protected] macOS/iOS: macOS Apple Silicon…
35 -
llama.cpp releases dev-tools 1mo ago
b9141
server, webui: accept continue_final_message flag for vLLM API compat ( #23012 ) server, webui: accept continue_final_message flag for vLLM API compat Add the continue_final_message body flag from the vLLM and transformers API. When set together with add_generation_prompt false,…
11 -
llama.cpp releases dev-tools 1mo ago
b9140
opencl: fix crash when warming up MoE on Adreno ( #22876 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
16 -
llama.cpp releases dev-tools 1mo ago
b9139
flush the gpu profile timestamp before the queryset is overflowed ( #22995 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
6 -
llama.cpp releases dev-tools 1mo ago
b9134
download: do not exit() on error ( #23008 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
27 -
llama.cpp releases dev-tools 1mo ago
b9133
server, webui: support continue generation on reasoning models ( #22727 ) server, webui : support continue generation on reasoning models ( #22727 ) Remove the throw blocking assistant prefill on reasoning models and orchestrate thinking tags around the prefilled message so the…
27 -
llama.cpp releases dev-tools 1mo ago
b9131
spec : update CLI arguments for better consistency ( #22964 ) spec : update CLI arguments for better consistency cont : fix CLI arg message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
33 2 -
llama.cpp releases dev-tools 1mo ago
b9129
ggml-zendnn : adaptive fallback to CPU backend for small batch sizes ( #22681 ) ggml-zendnn : add runtime env var GGML_ZENDNN_ADAPTIVE_FALLBACK to control adaptive fallback (default: enabled) ggml-zendnn : restore original fallback logic when adaptive fallback is disabled…
9 -
llama.cpp releases dev-tools 1mo ago
b9128
hexagon: eliminate scalar VTCM loads via HVX splat helpers ( #22993 ) hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase hmx-mm: optimize per-group scale handling hmx-fa: optimize slope load from vtcm hmx-fa: use aligned access where possible in…
4 -
llama.cpp releases dev-tools 1mo ago
b9127
opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill ( #22755 ) ggml-opencl: add Adreno xmem F16xF32 GEMM for prefill ggml-opencl: address Adreno xmem review comments ggml-opencl: align xmem gemm kernel naming Co-authored-by: Your Name [email protected] macOS/iOS: macOS Apple…
17 -
llama.cpp releases dev-tools 1mo ago
b9124
mtmd, server, common: expose modalities to /v1/models ( #22952 ) mtmd, server, common: expose modalities to /v1/models fix build rename to mtmd_caps macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…
11