llama.cpp releases
454 articles archived · Visit source ↗ · RSS
-
llama.cpp releases dev-tools 7d ago
b9763
server : Add id to tool call responses api ( #24882 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
32 -
llama.cpp releases dev-tools 7d ago
b9761
server: (router) move model downloading to dedicated process ( #24834 ) server: real-time model load progress tracking via /models/sse update docs server: move model download to child process rm unused fix most problems clean up nit fixes fix test case do not detact() thread…
8 -
llama.cpp releases dev-tools 7d ago
b9760
server: refactor/generalize input file schema ( #24299 ) server: refactor/generalize input file schema wire up input_video, accept raw base64 nits nits (2) fix windows macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64)…
36 -
llama.cpp releases dev-tools 7d ago
b9758
[SYCL] support bf16 on bin_bcast OP and unary OPs ( #24838 ) support bf16 on bin_bcast OP and unary OPs support the older Intel compiler than 2026.0 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…
23 -
llama.cpp releases dev-tools 7d ago
b9757
sampling : remove unconditional softmax+sort in top-n-sigma sampler ( #22645 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
13 -
llama.cpp releases dev-tools 7d ago
b9756
server: fix edit_file crash on append at end of file (line_start -1) ( #24893 ) line_start -1 normalized to n+1, so append inserted at lines.begin() + n + 1, one past end() -> heap-buffer-overflow in vector::_M_range_insert. Normalize -1 to n (insert at end()), restrict -1 to…
24 -
llama.cpp releases dev-tools 8d ago
b9755
docs/android.md: Add dependency libandroid-spawn for building in te…
7 -
llama.cpp releases dev-tools 8d ago
b9754
common/peg : implement ac parser for stricter grammar generation ( #24869 ) common/peg : implement ac parser cont : extract functions cont : tidy up cont : remove a test cont : move ac() def macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
33 -
llama.cpp releases dev-tools 8d ago
b9753
server: fix report progress for loading spec models, add "stages" list ( #24870 ) server: fix report progress for loading spec models, add "stages" list improve nits nits 2 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel…
28 -
llama.cpp releases dev-tools 8d ago
b9752
server: refactor batch construction ( #24843 ) server: refactor batch construction wip wip 2 wip 3 wip 4 add abort_all_slots handle batch full more carefully fix assert rm debug log small nits (debug) add timings debug: force llama_synchronize for accurate timings address…
5 -
llama.cpp releases dev-tools 8d ago
b9751
mtmd: fix mtmd_get_memory_usage ( #24867 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…
13 -
llama.cpp releases dev-tools 8d ago
b9750
jinja : implement call statement ( #24847 ) implement call statement undo unintended change de-lambda simplify move caller context inside function handler macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…
9 -
llama.cpp releases dev-tools 8d ago
b9748
server: add "verbose" field to schema ( #24864 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
18 -
llama.cpp releases dev-tools 8d ago
b9747
server: real-time model load progress tracking via /models/sse ( #24828 ) server: real-time model load progress tracking via /models/sse update docs add mutex for notify_to_router correct docs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
28 -
llama.cpp releases dev-tools 8d ago
b9745
spec : Support Step3.5/3.7 flash mtp3 ( #24340 ) add mtp_layer_offset + include nextn flags in graph reuse add llama_set_mtp_layer_offset + llama_model_n_nextn_layer API offset head select + require all MTP blocks speculative multi-head process() speculative multi-head draft()…
6 -
llama.cpp releases dev-tools 9d ago
b9744
common/peg : refactor until gbnf grammar generation ( #24839 ) common/peg : refactor until gbnf grammar into an ac automaton cont : add a test with multiple strings cont : pad state with 0s so rules line up cont : clean up comments cont : use set everywhere cont : inline state…
4 -
llama.cpp releases dev-tools 9d ago
b9743
common/json-schema-to-grammar : align spacing rules with parsers ( #24835 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
11 -
llama.cpp releases dev-tools 9d ago
b9742
fix(hexagon): use padded stride for ssm-conv weights ( #24470 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
4 -
llama.cpp releases dev-tools 9d ago
b9741
llama : use LLM_KV for quantization_version & file_type ( #24802 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…
27 -
llama.cpp releases dev-tools 9d ago
b9740
arg: try fixing test-args-parser randomly fails ( #24826 ) arg: try fixing test-args-parser randomly fails return ref try triggering the workflow exception wrapper wip test test 2 arg: guard win32 utf8 argv override make_utf8_argv rebuilds argv from GetCommandLineW to fix utf8…
8 -
llama.cpp releases dev-tools 9d ago
b9739
release: add missing link for win opencl adreno arm64 ( #24809 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…
33 -
llama.cpp releases dev-tools 9d ago
b9738
server: avoid forwarding auth headers in CORS proxy ( #24373 ) server: avoid forwarding auth headers in CORS proxy format fix test fix e2e test Co-authored-by: Xuan Son Nguyen [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
19 -
llama.cpp releases dev-tools 9d ago
b9736
model : glm-dsa load DSA indexer tensors as optional ( #24770 ) GLM-5.2 ships the DSA "lightning indexer" on only a subset of layers (the "full" layers; others omit it), but the GLM_DSA loader created the five indexer tensors on every layer as required, so loading any GLM-5.2…
14 -
llama.cpp releases dev-tools 9d ago
b9737
docker : prebuild web UI for s390x build [no release] ( #24829 )
31 -
llama.cpp releases dev-tools 10d ago
b9733
ggml-webgpu: add adapter toggles for F16 on Vulkan + NVIDIA macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
11 -
llama.cpp releases dev-tools 10d ago
b9732
server: refactor child --> router communication ( #24821 ) server: refactor child --> router communication fix wakeup case add docs improve update_status() nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…
13 -
llama.cpp releases dev-tools 10d ago
b9731
server : optimize get_token_probabilities ( #24796 ) Use std::partial_sort to order only the requested top-n tokens instead of the full vocabulary logprobs sort: vocab=128000 n_top=0 iters=100 full sort: 8555.6 us/op partial sort: 704.3 us/op Signed-off-by: Adrien Gallouët…
37 -
llama.cpp releases dev-tools 10d ago
b9730
mtmd, arg: fix utf8 handling on windows ( #24779 ) mtmd, arg: fix utf8 handling on windows also fix ggml_fopen fix build fail also fix CLI macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…
36 -
llama.cpp releases dev-tools 10d ago
b9729
server: remove all internal mentions about "webui" ( #24817 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
32 -
llama.cpp releases dev-tools 10d ago
b9728
arg: Add comment line support to --api-key-file ( #23168 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
25 -
llama.cpp releases dev-tools 10d ago
b9727
vendor : update cpp-httplib to 0.48.0 ( #24787 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
38 -
llama.cpp releases dev-tools 10d ago
b9726
server: add --agent arg, remove redundant webui naming compat ( #24801 ) server: add --agent arg, remove redundant webui naming compat corrent env fix the test llama-gen-docs nits: wordings macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…
10 -
llama.cpp releases dev-tools 10d ago
b9725: docker : build the UI (#24794)
docker : build the UI cont : use existing APP_VERSION
5 -
llama.cpp releases dev-tools 10d ago
b9724
mtmd: several bug fixes ( #24784 ) mtmd: several bug fixes fix build fix gemma4ua add sanity check in get_u32() fix build (2) area() avoid overflow macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…
27 -
llama.cpp releases dev-tools 10d ago
b9723
spec: support eagle3 for qwen3.5 & 3.6 ( #24593 ) spec: support qwen3.5 & 3.6 eagle3 draft eagle3: Add deferred boundary checkpoints restore support for hybrid models apply suggestions Co-authored-by: Georgi Gerganov [email protected] spec: adapt to API change spec: fix naming…
21 -
llama.cpp releases dev-tools 10d ago
b9722
server: fix non-bound n_discard value (ctx shifting) ( #24786 ) server: fix non-bound n_discard value Update tools/server/server-context.cpp Co-authored-by: Georgi Gerganov [email protected] Co-authored-by: Georgi Gerganov [email protected] macOS/iOS: macOS Apple Silicon…
36 -
llama.cpp releases dev-tools 10d ago
b9721
sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…
31 -
llama.cpp releases dev-tools 10d ago
b9718
server : consolidate slot selection into get_available_slot ( #24755 ) Absorb get_slot_by_id logic into get_available_slot so slot selection is handled by a single function call. When a specific slot id is requested, the LCP similarity check still runs to enable proper prompt…
25 -
llama.cpp releases dev-tools 11d ago
b9717
ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul ( #24753 ) ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass…
38 -
llama.cpp releases dev-tools 11d ago
b9716
mtmd: add batching support for internvl ( #24775 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
16 -
llama.cpp releases dev-tools 11d ago
b9715
Ggml/cuda col2im 1d ( #24417 ) cuda: add GGML_OP_COL2IM_1D, follow-up to the CPU op cuda: col2im_1d use fast_div_modulo for the index decomposition cuda: col2im_1d tighten supports_op, type match and contiguous dst macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…
30 -
llama.cpp releases dev-tools 11d ago
b9714
server: add "X-Accel-Buffering": "no" header to streaming endpoints ( #24774 ) server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints) Without it, Nginx will…
11 -
llama.cpp releases dev-tools 11d ago
b9713
mtmd: add batching for mtmd-cli, add video tests ( #24778 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…
22 -
llama.cpp releases dev-tools 11d ago
b9712
cmake : fix ui build with read-only source ( #24752 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…
4 -
llama.cpp releases dev-tools 11d ago
b9711
mtmd: refactor llava-uhd overview image handling (always use ov_img_first) ( #24769 ) add dedicated "overview" for mtmd_image_preproc_out corrections correct (again) nits nits (2) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…
14 -
llama.cpp releases dev-tools 11d ago
b9707
server: add "schema" and validation ( #24150 ) wip working correct some limits add field name to error message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…
5 -
llama.cpp releases dev-tools 11d ago
b9704
server : return HTTP 400 on invalid grammar ( #24144 ) ( #24154 ) Throw on grammar parse failure so the server returns HTTP 400 instead of silently dropping the constraint. Add a regression test for the invalid-grammar response. Fixes #24144 macOS/iOS: macOS Apple Silicon…
26 -
llama.cpp releases dev-tools 11d ago
b9703
server: (router) rework -hf preset repo ( #24739 ) server: temporary remove HF remote preset rework remove preset.ini support rm unused get_remote_preset_whitelist() print warning add docs rm stray file macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…
19 -
llama.cpp releases dev-tools 11d ago
b9702
server: fix router args not being forwarded to child instances ( #24760 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…
14