llama.cpp releases
455 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 26d ago
b9489
cuda: reserve space for quantize kv-cache at startup ( #23907 ) cuda: reserve space for quantize kv-cache at startup address review comments remove forward decl Co-authored-by: Johannes Gäßler [email protected] remove assert in ggml-cuda.cu Co-authored-by: Johannes Gäßler…
25 -
llama.cpp releases dev-tools 27d ago
b9488
tests : add support for qwen3 SSM archs ( #24031 ) tests : add support for qwen3 SSM archs arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS cont : naming + TODOs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…
24 -
llama.cpp releases dev-tools 27d ago
b9486
ci : disable ccache for msvc windows release jobs ( #23911 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
6 -
llama.cpp releases dev-tools 27d ago
b9485
arg : removed unecesary mmproj download when users pass --no-mmproj ( #23425 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
38 -
llama.cpp releases dev-tools 27d ago
b9484
opencl: use flat variants of q4_K and q6_K gemv for very large M ( #24006 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
6 -
llama.cpp releases dev-tools 27d ago
b9483
hexagon: profiler output fix and script updates ( #24042 ) hex-ops: fix profiler output (ie remove the redundant NONEs) hex-prof: update profiling script to support tot.usec column macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…
26 -
llama.cpp releases dev-tools 27d ago
b9482
model: add Mellum architecture ( #23966 ) model: support for Mellum architecture model: improve mellum.py formatting model: improve mellum.py formatting once again deps: downgrade transformers to 4.57.6 (to fix CI) deps: remove huggingface_hub dependency deps: remove…
13 -
llama.cpp releases dev-tools 27d ago
b9481
model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) ( #22716 ) Add support for the ibm-granite/granite-embedding-{97m,311m}-multilingual-r2 embedding models: Added a version of the gpt4o tokenizer that has a fixed regex…
22 -
llama.cpp releases dev-tools 27d ago
b9480
StepFun 3.5 MTP ( #23274 ) StepFun 3.5 MTP Simplify to single layer Rollback core changes fix flake8 errors Remove scripts modify to convention Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret [email protected] dos2unix Co-authored-by: Sigbjørn…
14 -
llama.cpp releases dev-tools 27d ago
b9479
common : fix state save in common_prompt_batch_decode ( #23468 ) common : fix state save in common_prompt_batch_decode This commit addresses a bug in common_prompt_batch_decode that affects the session state store/restore in completion.cpp and save-load-state.cpp. The motivation…
10 -
llama.cpp releases dev-tools 27d ago
b9478
server: add SSE ping interval ( #24013 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…
17 -
llama.cpp releases dev-tools 27d ago
b9474
ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI ( #23434 ) feat: Add "Thinking" toggle and status icon + redesign Chat Form Actions Add panel test: Update test reference fix: Icon fix: E2E test command fix: wait for greeting…
5 -
llama.cpp releases dev-tools 27d ago
b9473
kv-cache : SWA checkpoints store only non-masked cells ( #23981 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
37 -
llama.cpp releases dev-tools 28d ago
b9471
llama : deprecate llama_set_warmup ( #24009 ) llama : deprecate llama_set_warmup cont : fix type Co-authored-by: Daniel Bevenius [email protected] Co-authored-by: Daniel Bevenius [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…
19 -
llama.cpp releases dev-tools 28d ago
b9470
hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimizations for latest models ( #23989 ) hex-mm: initial support for F32 * F32 -> F32 matmuls hex-rms-norm: fix src1 stride use in fused rms_norm_mul hex-ops: clear spad pointers in the ops that clober it This fixes…
10 -
llama.cpp releases dev-tools 28d ago
b9469
hexagon: add gelu_quick ( #24007 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…
37 -
llama.cpp releases dev-tools 28d ago
b9468
server: real-time reasoning interruption via control endpoint ( #23971 ) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949 . Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls…
17 -
llama.cpp releases dev-tools 28d ago
b9467
clean up unused variables warnings ( #23975 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
14 -
llama.cpp releases dev-tools 28d ago
b9466
opencl: fix compiler warnings for non-adreno path ( #23922 ) opencl: fix compiler warnings for non-adreno path opencl: fix const cast warning macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…
31 -
llama.cpp releases dev-tools 28d ago
b9464
speculative : fix n_outputs_max and remove draft-simple auto-enable ( #23988 ) speculative : add common_speculative_n_max helper function Extract the speculative max-draft-size logic from server_n_outputs_max into a reusable common_speculative_n_max() function in…
7 -
llama.cpp releases dev-tools 28d ago
b9460
llama: limit max outputs of llama_context ( #23861 ) llama: save more VRAM by reserving n_outputs == n_seqs when possible add n_outputs_per_seq move n_outputs_max to server-context change ubatch to batch everywhere macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
15 -
llama.cpp releases dev-tools 28d ago
b9459
metal: template GLU kernels to support f16/f32 ( #23882 ) Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in…
35 -
llama.cpp releases dev-tools 28d ago
b9458
vulkan: don't hold the device mutex while compiling pipelines ( #23641 ) vulkan: don't hold the device mutex while compiling pipelines We need to hold a lock while we traverse all pipelines and lazily initialize them, but we don't need to hold it while the pipeline is being…
37 -
llama.cpp releases dev-tools 28d ago
b9457
vulkan: reduce host memory lock contention ( #23376 ) vulkan: reduces lock contention replace unique_lock with lock_guard macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
26 -
llama.cpp releases dev-tools 28d ago
b9455
TP: quantized KV cache support ( #23792 ) TP: quantized KV cache support fix partial view remove overly strict assert macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
15 -
llama.cpp releases dev-tools 28d ago
b9453
model: Add EXAONE 4.5 implementations ( #21733 ) Add EXAONE 4.5 and Add GQA for MMproj mtmd: EXAONE 4.5 vision markers and projector path EXAONE 4.5 uses and for image boundaries; Qwen keeps <|vision_start|> and <|vision_end|>. Route EXAONE 4.5 through the Qwen2.5-VL-style…
32 -
llama.cpp releases dev-tools 28d ago
b9452
vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints ( #23056 ) Q2_K/Q3_K/Q6_K do much better when using MMVQ on Intel BMG even though they're only 2-byte aligned, and Q3_K still wins on NVIDIA as well. mesa isn't all that great at coalescing back-to-back loads from…
4 -
llama.cpp releases dev-tools 28d ago
b9451
vulkan: Removed unused functions ( #23175 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
37 -
llama.cpp releases dev-tools 29d ago
b9445: ci: remove redundant or duplicate jobs (#23927)
remove redundant apple job openvino gpu and cpu test can share the same build and machine Update build-rpc.yml Update build-openvino.yml cpu any doesnt make sense as we have an arm job already, so do high perf on both x86 and arm remove duplicate x86 vulkan combine backend…
31 -
llama.cpp releases dev-tools 29d ago
b9444
server : handle If-None-Match weak ETags ( #23916 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
12 -
llama.cpp releases dev-tools 29d ago
b9442
vocab : add tokenizer support for jina-embeddings-v2-base-zh ( #18756 ) vocab : add jina-embeddings-v2-base-zh (whitespace tokenizer) lowercase defaults to true type fix Co-authored-by: Sigbjørn Skjæret [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS…
12 -
llama.cpp releases dev-tools 1mo ago
b9441
ui: fix ETag truncation with MSVC compiler ( #23917 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
20 -
llama.cpp releases dev-tools 1mo ago
b9439
llama: only use one iGPU device by default ( #23897 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
4 -
llama.cpp releases dev-tools 1mo ago
b9438: webui: add custom CSS injection via config (#23904)
webui: add custom CSS injection via config register a customCSS setting in the Developer section under Custom JSON, syncable so it rides the existing ui-config pass through. inject the value into a single style element in the head, reactive on the setting. lets an operator theme…
28 -
llama.cpp releases dev-tools 1mo ago
b9437
Support -fa auto in llama-bench ( #23714 ) Support -fa auto in llama-bench Make the default value of -ngl -1, similar to other tools. Update README with latest usage and examples Address review comments macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
16 -
llama.cpp releases dev-tools 1mo ago
b9436
opencl: support bf16 by converting to f16 ( #23839 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
18 -
llama.cpp releases dev-tools 1mo ago
b9434
TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs ( #23843 ) TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs fix afmoe TP macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
25 -
llama.cpp releases dev-tools 1mo ago
b9433
metal : restore im2col implementation for large kernels ( #23901 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
31 -
llama.cpp releases dev-tools 1mo ago
b9432
test: (test-llama-archs) log the config name first ( #23885 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
37 -
llama.cpp releases dev-tools 1mo ago
b9431
ci : update ios-xcode release job to macos-26 ( #23906 ) ci : disable libcommon build from xcframework ocd : fix name ci : ios-xcode change to macos-26 cont : pin xcode cont : pin xcode to minor version macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
34 -
llama.cpp releases dev-tools 1mo ago
b9430
ggml : add some lsx support ( #23798 ) loongarch : optimize LSX fp16 load/store with native intrinsics Use __lsx_vfcvtl_s_h and __lsx_vfcvt_h_s instead of scalar loops in __lsx_f16x4_load and __lsx_f16x4_store. loongarch : add LSX implementation for q8_0 dot product loongarch :…
21 -
llama.cpp releases dev-tools 1mo ago
b9428
ci : fix s390x release job ( #23898 ) ci : fix s390x release job ci : multi-thread build for ios-xcode ocd : names macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…
6 -
llama.cpp releases dev-tools 1mo ago
b9426: llama : do not skip iGPU when only RPC devices are present (#23868)
After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made model->devices non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128…
12 -
llama.cpp releases dev-tools 1mo ago
b9415
download: add option to skip_download ( #23059 ) download: add option to skip_download fix fix 2 if file doesn't exist, respect skip_download flag macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…
16 -
llama.cpp releases dev-tools 1mo ago
b9414
mtmd: Add DeepSeekOCR 2 Support ( #20975 ) mtmd: DeepSeek-OCR 2 support, with multi-tile dynamic resolution introduced clip_image_f32::add_viewsep address PR review drop redundant ggml_cpy ops in both deepseekocr versions build drop no-op ggml_cont in build_sam assert…
30 -
llama.cpp releases dev-tools 1mo ago
b9413
CUDA: Check PTX version on host side to guard PDL dispatch ( #23530 ) CUDA: Check PTX version on host side to guard PDL dispatch Checking on __CUDA_ARCH_LIST__ alone is insufficient for JIT, as this variable doesn't differentiate between compiling for say sm_90, sm_90a or sm_90f…
26 -
llama.cpp releases dev-tools 1mo ago
b9412
server: bump timeout to 3600s ( #23842 ) server: bump timeout to 3600s nits: change wording macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…
32 -
llama.cpp releases dev-tools 1mo ago
b9411
model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation ( #23346 ) llama : support DeepSeek V3.2 model family (with DSA lightning indexer) convert : handle DeepseekV32ForCausalLM architecture ggml : support for f16 GGML_OP_FILL…
34 -
llama.cpp releases dev-tools 1mo ago
b9410
llama: use f16 mask for FA to save VRAM ( #23764 ) llama: use f16 mask for FA review: add llama_cast + formatting simplify macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…
7