llama.cpp releases
12 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 5h ago
b9133
server, webui: support continue generation on reasoning models ( #22727 ) server, webui : support continue generation on reasoning models ( #22727 ) Remove the throw blocking assistant prefill on reasoning models and orchestrate thinking tags around the prefilled message so the…
27 -
llama.cpp releases dev-tools 6h ago
b9131
spec : update CLI arguments for better consistency ( #22964 ) spec : update CLI arguments for better consistency cont : fix CLI arg message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…
33 2 -
llama.cpp releases dev-tools 8h ago
b9129
ggml-zendnn : adaptive fallback to CPU backend for small batch sizes ( #22681 ) ggml-zendnn : add runtime env var GGML_ZENDNN_ADAPTIVE_FALLBACK to control adaptive fallback (default: enabled) ggml-zendnn : restore original fallback logic when adaptive fallback is disabled…
9 -
llama.cpp releases dev-tools 16h ago
b9128
hexagon: eliminate scalar VTCM loads via HVX splat helpers ( #22993 ) hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase hmx-mm: optimize per-group scale handling hmx-fa: optimize slope load from vtcm hmx-fa: use aligned access where possible in…
4 -
llama.cpp releases dev-tools 20h ago
b9127
opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill ( #22755 ) ggml-opencl: add Adreno xmem F16xF32 GEMM for prefill ggml-opencl: address Adreno xmem review comments ggml-opencl: align xmem gemm kernel naming Co-authored-by: Your Name [email protected] macOS/iOS: macOS Apple…
17 -
llama.cpp releases dev-tools 23h ago
b9124
mtmd, server, common: expose modalities to /v1/models ( #22952 ) mtmd, server, common: expose modalities to /v1/models fix build rename to mtmd_caps macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…
11 -
llama.cpp releases dev-tools 1d ago
b9123
ggml-webgpu: Enables running gpt-oss-20b ( #22906 ) Enable to run gpt-oss-20b and refactor mulmat-q disable test-backend-ops in ubuntu-24-webgpu macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu…
13 -
llama.cpp releases dev-tools 1d ago
b9122
ggml-webgpu: address precision issues for multimodal ( #22808 ) fix(mixed-types): use f32 for precision and update the shared memory calculation logic for f32 fix(unary): correct the gelu, gelu quick and gelu erf functions fix(flash-attn-tile): fix the hardcode v type…
9 -
llama.cpp releases dev-tools 1d ago
b9119
vulkan: Fix Windows performance regression on Intel GPU BF16 workloads for Xe2 and newer ( #22461 ) refactor Use l_warptile only when coopamt is available for BF16 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS…
23 -
llama.cpp releases dev-tools 1d ago
b9118
vulkan: Check shared memory size for mmq shaders ( #22693 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
31 -
llama.cpp releases dev-tools 1d ago
b9116
mtmd: add MiMo v2.5 vision ( #22883 ) mimo-v2.5: vision support mimo-v2.5: use fused qkv for vision mimi-v2.5: fix f16 vision overflow mimo-v2.5: comment cleanups mimo-v2.5: Flash doesn't have mmproj more cleanup remember to use filter_tensors mimo-v2.5: fix trailing whitespace…
25 -