Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

456 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 1mo ago

b9222

hexagon: add support for TRI op ( #22822 ) Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context addressed PR review comments for TRI op hexagon: clang format hex-unary: remove merge conflict markers hex-ggml: remove duplicate op cases (merge conflict) hex-ggml:…

36
llama.cpp releases dev-tools 1mo ago

b9221

ggml-hexagon: add PAD op HVX kernel ( #23078 ) ggml-hexagon: add PAD op HVX kernel Implements GGML_OP_PAD on the Hexagon HTP backend using HVX vectorized kernels. Supports zero-padding and circular padding across all 4 tensor dimensions. hex-ggml: remove duplicate op cases…

26
llama.cpp releases dev-tools 1mo ago

b9219

common : remove hf cache migration ( #23266 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

20
llama.cpp releases dev-tools 1mo ago

b9216

ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG ( #23236 ) refactor: Scope console logs to DEV + VITE_DEBUG env vars refactor: skip MCP proxy probe when no server requires it refactor: suppress expected disconnect errors during MCP client shutdown…

33
llama.cpp releases dev-tools 1mo ago

b9213

llama: initialize pre-norm embedding mask flag ( #23256 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

21
llama.cpp releases dev-tools 1mo ago

b9208

sycl: route small f32 matmuls to oneMKL, bypass oneDNN ( #22150 ) Signed-off-by: Chun Tao [email protected] Co-authored-by: Chun Tao [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…

18
llama.cpp releases dev-tools 1mo ago

b9209: sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (#22156)

Signed-off-by: Chun Tao [email protected] Co-authored-by: Chun Tao [email protected]

11
llama.cpp releases dev-tools 1mo ago

b9204

feat: Support d_conv=15 for ssm-conv.cu ( #23017 ) Branch: ModalityConditionalAdapters AI-usage: none Signed-off-by: Gabe Goodhart [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…

21
llama.cpp releases dev-tools 1mo ago

b9203

cmake : fix LLAMA_BUILD_UI logic ( #23190 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

4
llama.cpp releases dev-tools 1mo ago

b9202

cmake : do not install conversion script ( #23204 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

5
llama.cpp releases dev-tools 1mo ago

b9200

llama: avoid copying logits during prompt decode in MTP ( #23198 ) llama: avoid copying logits during prompt decode in MTP review: update comment llama-graph: call set_output for t_h_pre_norm macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

10
llama.cpp releases dev-tools 1mo ago

b9198

ggml-vulkan/CMakeLists: add a check for SPIRV-Headers ( #22009 ) ci/run: set explicit SPIR-V Headers search path for macOS vulkan CI For whatever reason, the files are under additional sub-path vulkan/ under the cmake directory, which does not match either current LunarG macOS…

8
llama.cpp releases dev-tools 1mo ago

b9197

vulkan: add cpy bf16 -> f32 pipelines ( #22677 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

23
llama.cpp releases dev-tools 1mo ago

b9196

vulkan: Support unaligned tensors for ROPE ( #22637 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

36
llama.cpp releases dev-tools 1mo ago

b9194

vulkan: fuse SSM_CONV + BIAS + SILU ( #22653 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

34
llama.cpp releases dev-tools 1mo ago

b9193

server : honor --embd-normalize CLI arg ( #23125 ) The --embd-normalize flag was registered only for the embedding and debug examples, so llama-server rejected it and the /embedding handler used a hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's example set…

7
llama.cpp releases dev-tools 1mo ago

b9192

ngram : reduce noisy logs ( #23185 ) ngram : reduce noisy logs ngram : reduce noisy logs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

19
llama.cpp releases dev-tools 1mo ago

b9191

webui: support video files as input ( #22830 )

33
llama.cpp releases dev-tools 1mo ago

b9190

server: (router) alloc tmp buffer on heap ( #23159 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

16
llama.cpp releases dev-tools 1mo ago

b9189

server: skip device enumeration in router mode to avoid creating CUDA primary context ( #23137 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

7
llama.cpp releases dev-tools 1mo ago

b9186

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…

15
llama.cpp releases dev-tools 1mo ago

b9181

vendor : update cpp-httplib to 0.45.0 ( #23103 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

12
llama.cpp releases dev-tools 1mo ago

b9180

llama + spec: MTP Support ( #22673 ) spec: support MTP fix batch size rename files cont : simplify ( #7 ) MTP: clean-up ( #9 ) MTP: clean-up review: use llama_context_type instead of llama_graph_type review: remove llama_model_has_mtp review: fix convert issues convert: fix…

37
llama.cpp releases dev-tools 1mo ago

b9174

ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming ( #23064 ) webui: Move static build output from tools/server/public to build/ui directory refactor: Move to tools/ui refactor: rename CMake variables and preprocessor defines Rename…

36
llama.cpp releases dev-tools 1mo ago

b9173

ci : fix release symlinks ( #23119 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm…

33
llama.cpp releases dev-tools 1mo ago

b9172

webui: Use lowercase hash for HF checksum check ( #23107 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

24
llama.cpp releases dev-tools 1mo ago

b9169

mtmd: add chunks and fix preproc for qwen3a ( #23073 ) mtmd: add chunks and fix preproc for qwen3a add attn_mask limit mtmd_chunk size (avoid blow up memory) correct audio tokens re-order the set_input case remove attn_mask macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

7
llama.cpp releases dev-tools 1mo ago

b9165

ci : fix transform of top . entry in release archive ( #23080 ) fix transform of top . entry in release archive simplify macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

11
llama.cpp releases dev-tools 1mo ago

b9163

reasoning-budget: clone should do a deep-copy ( #23095 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

36
llama.cpp releases dev-tools 1mo ago

b9161

Support for Codex CLI by skipping unsupported Responses tools ( #23041 ) Support for Codex CLI by skipping unsupported Responses tools Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection Revert gpt-oss apply_patch special handling macOS/iOS: macOS Apple…

29
llama.cpp releases dev-tools 1mo ago

b9159

ggml-hexagon: cpy: add contiguous fast-path in reshape copy ( #23076 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

4
llama.cpp releases dev-tools 1mo ago

b9158

HIP: RDNA3 mma FA, faster AMD transpose, tune AMD ( #22880 ) Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80…

25
llama.cpp releases dev-tools 1mo ago

b9156

ggml-webgpu: Enable NVIDIA self-hosted CI ( #22976 ) Enabel nvidia ci for webgpu Address precision issues fix placement Relax more set_rows and div Try relaxing all f16 formatting and naming Add comment explaining max_nmse_err logic Added comment referencing pull request for…

21
llama.cpp releases dev-tools 1mo ago

b9151

logs : reduce ( #23021 ) logs : reduce args : fix envs server : fix build common : print verbosity level at start server : clean-up logs server : print prompt processing timings + sampling params minor : whitespaces macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

8
llama.cpp releases dev-tools 1mo ago

b9150

ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend ( #22863 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

20
llama.cpp releases dev-tools 1mo ago

b9148

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… ( #22110 ) unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regression tests Add unicode_regex_split_custom_qwen35() to src/unicode.cpp , a non-backtracking handler for Qwen3.5's [\p{L}\p{M}]+…

18
llama.cpp releases dev-tools 1mo ago

b9145

SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations ( #21597 ) SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations Replace sycl::malloc_device with zeMemAllocDevice for GPU memory allocation in the SYCL backend. sycl::malloc_device…

6
llama.cpp releases dev-tools 1mo ago

b9144

ggml-webgpu: only use subgroup-matrix path when head dims are divisible by sg_mat_k / sg_mat_n ( #23020 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

7
llama.cpp releases dev-tools 1mo ago

b9143

Fix for issue #22974 . Cast intermediate results to float before adding and casting the result to the destination type. Avoids half+half operator ambiguity. ( #22994 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS…

33
llama.cpp releases dev-tools 1mo ago

b9142

opencl: add q5_0 and q5_1 MoE for Adreno ( #22985 ) opencl: add q5_0 moe support opencl: add q5_1 moe support opencl: avoid potential leak opencl: suppress unused var warning when building for non-Adreno Co-authored-by: Li He [email protected] macOS/iOS: macOS Apple Silicon…

35
llama.cpp releases dev-tools 1mo ago

b9141

server, webui: accept continue_final_message flag for vLLM API compat ( #23012 ) server, webui: accept continue_final_message flag for vLLM API compat Add the continue_final_message body flag from the vLLM and transformers API. When set together with add_generation_prompt false,…

11
llama.cpp releases dev-tools 1mo ago

b9140

opencl: fix crash when warming up MoE on Adreno ( #22876 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

16
llama.cpp releases dev-tools 1mo ago

b9139

flush the gpu profile timestamp before the queryset is overflowed ( #22995 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

6
llama.cpp releases dev-tools 1mo ago

b9134

download: do not exit() on error ( #23008 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

27
llama.cpp releases dev-tools 1mo ago

b9133

server, webui: support continue generation on reasoning models ( #22727 ) server, webui : support continue generation on reasoning models ( #22727 ) Remove the throw blocking assistant prefill on reasoning models and orchestrate thinking tags around the prefilled message so the…

27
llama.cpp releases dev-tools 1mo ago

b9131

spec : update CLI arguments for better consistency ( #22964 ) spec : update CLI arguments for better consistency cont : fix CLI arg message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…

33 2
llama.cpp releases dev-tools 1mo ago

b9129

ggml-zendnn : adaptive fallback to CPU backend for small batch sizes ( #22681 ) ggml-zendnn : add runtime env var GGML_ZENDNN_ADAPTIVE_FALLBACK to control adaptive fallback (default: enabled) ggml-zendnn : restore original fallback logic when adaptive fallback is disabled…

9
llama.cpp releases dev-tools 1mo ago

b9128

hexagon: eliminate scalar VTCM loads via HVX splat helpers ( #22993 ) hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase hmx-mm: optimize per-group scale handling hmx-fa: optimize slope load from vtcm hmx-fa: use aligned access where possible in…

4
llama.cpp releases dev-tools 1mo ago

b9127

opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill ( #22755 ) ggml-opencl: add Adreno xmem F16xF32 GEMM for prefill ggml-opencl: address Adreno xmem review comments ggml-opencl: align xmem gemm kernel naming Co-authored-by: Your Name [email protected] macOS/iOS: macOS Apple…

17
llama.cpp releases dev-tools 1mo ago

b9124

mtmd, server, common: expose modalities to /v1/models ( #22952 ) mtmd, server, common: expose modalities to /v1/models fix build rename to mtmd_caps macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…

11

Page 9 of 10 · 456 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *