llama.cpp releases

12 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 5h ago

b9133

server, webui: support continue generation on reasoning models ( #22727 ) server, webui : support continue generation on reasoning models ( #22727 ) Remove the throw blocking assistant prefill on reasoning models and orchestrate thinking tags around the prefilled message so the…

27
llama.cpp releases dev-tools 6h ago

b9131

spec : update CLI arguments for better consistency ( #22964 ) spec : update CLI arguments for better consistency cont : fix CLI arg message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…

33 2
llama.cpp releases dev-tools 8h ago

b9129

ggml-zendnn : adaptive fallback to CPU backend for small batch sizes ( #22681 ) ggml-zendnn : add runtime env var GGML_ZENDNN_ADAPTIVE_FALLBACK to control adaptive fallback (default: enabled) ggml-zendnn : restore original fallback logic when adaptive fallback is disabled…

9
llama.cpp releases dev-tools 16h ago

b9128

hexagon: eliminate scalar VTCM loads via HVX splat helpers ( #22993 ) hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase hmx-mm: optimize per-group scale handling hmx-fa: optimize slope load from vtcm hmx-fa: use aligned access where possible in…

4
llama.cpp releases dev-tools 20h ago

b9127

opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill ( #22755 ) ggml-opencl: add Adreno xmem F16xF32 GEMM for prefill ggml-opencl: address Adreno xmem review comments ggml-opencl: align xmem gemm kernel naming Co-authored-by: Your Name [email protected] macOS/iOS: macOS Apple…

17
llama.cpp releases dev-tools 23h ago

b9124

mtmd, server, common: expose modalities to /v1/models ( #22952 ) mtmd, server, common: expose modalities to /v1/models fix build rename to mtmd_caps macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…

11
llama.cpp releases dev-tools 1d ago

b9123

ggml-webgpu: Enables running gpt-oss-20b ( #22906 ) Enable to run gpt-oss-20b and refactor mulmat-q disable test-backend-ops in ubuntu-24-webgpu macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu…

13
llama.cpp releases dev-tools 1d ago

b9122

ggml-webgpu: address precision issues for multimodal ( #22808 ) fix(mixed-types): use f32 for precision and update the shared memory calculation logic for f32 fix(unary): correct the gelu, gelu quick and gelu erf functions fix(flash-attn-tile): fix the hardcode v type…

9
llama.cpp releases dev-tools 1d ago

b9119

vulkan: Fix Windows performance regression on Intel GPU BF16 workloads for Xe2 and newer ( #22461 ) refactor Use l_warptile only when coopamt is available for BF16 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS…

23
llama.cpp releases dev-tools 1d ago

b9118

vulkan: Check shared memory size for mmq shaders ( #22693 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

31
llama.cpp releases dev-tools 1d ago

b9116

mtmd: add MiMo v2.5 vision ( #22883 ) mimo-v2.5: vision support mimo-v2.5: use fused qkv for vision mimi-v2.5: fix f16 vision overflow mimo-v2.5: comment cleanups mimo-v2.5: Flash doesn't have mmproj more cleanup remember to use filter_tensors mimo-v2.5: fix trailing whitespace…

25
llama.cpp releases dev-tools 1d ago

b9115: convert : add split() to LoraTorchTensor in LoRA converter (#22832)

convert : add split() method to LoraTorchTensor Fix python type-check Fix flake8 Lint fix: handle positional dim arg in torch.split dispatch Fix type-check again Fix type-checks Remove unit test per reviewers feedback work around ty deficiency Co-authored-by: Sigbjørn Skjæret…

10