Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

454 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 7d ago

b9763

server : Add id to tool call responses api ( #24882 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

32
llama.cpp releases dev-tools 7d ago

b9761

server: (router) move model downloading to dedicated process ( #24834 ) server: real-time model load progress tracking via /models/sse update docs server: move model download to child process rm unused fix most problems clean up nit fixes fix test case do not detact() thread…

8
llama.cpp releases dev-tools 7d ago

b9760

server: refactor/generalize input file schema ( #24299 ) server: refactor/generalize input file schema wire up input_video, accept raw base64 nits nits (2) fix windows macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64)…

36
llama.cpp releases dev-tools 7d ago

b9758

[SYCL] support bf16 on bin_bcast OP and unary OPs ( #24838 ) support bf16 on bin_bcast OP and unary OPs support the older Intel compiler than 2026.0 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…

23
llama.cpp releases dev-tools 7d ago

b9757

sampling : remove unconditional softmax+sort in top-n-sigma sampler ( #22645 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

13
llama.cpp releases dev-tools 7d ago

b9756

server: fix edit_file crash on append at end of file (line_start -1) ( #24893 ) line_start -1 normalized to n+1, so append inserted at lines.begin() + n + 1, one past end() -> heap-buffer-overflow in vector::_M_range_insert. Normalize -1 to n (insert at end()), restrict -1 to…

24
llama.cpp releases dev-tools 8d ago

b9755

docs/android.md: Add dependency libandroid-spawn for building in te…

7
llama.cpp releases dev-tools 8d ago

b9754

common/peg : implement ac parser for stricter grammar generation ( #24869 ) common/peg : implement ac parser cont : extract functions cont : tidy up cont : remove a test cont : move ac() def macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

33
llama.cpp releases dev-tools 8d ago

b9753

server: fix report progress for loading spec models, add "stages" list ( #24870 ) server: fix report progress for loading spec models, add "stages" list improve nits nits 2 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel…

28
llama.cpp releases dev-tools 8d ago

b9752

server: refactor batch construction ( #24843 ) server: refactor batch construction wip wip 2 wip 3 wip 4 add abort_all_slots handle batch full more carefully fix assert rm debug log small nits (debug) add timings debug: force llama_synchronize for accurate timings address…

5
llama.cpp releases dev-tools 8d ago

b9751

mtmd: fix mtmd_get_memory_usage ( #24867 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

13
llama.cpp releases dev-tools 8d ago

b9750

jinja : implement call statement ( #24847 ) implement call statement undo unintended change de-lambda simplify move caller context inside function handler macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…

9
llama.cpp releases dev-tools 8d ago

b9748

server: add "verbose" field to schema ( #24864 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

18
llama.cpp releases dev-tools 8d ago

b9747

server: real-time model load progress tracking via /models/sse ( #24828 ) server: real-time model load progress tracking via /models/sse update docs add mutex for notify_to_router correct docs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

28
llama.cpp releases dev-tools 8d ago

b9745

spec : Support Step3.5/3.7 flash mtp3 ( #24340 ) add mtp_layer_offset + include nextn flags in graph reuse add llama_set_mtp_layer_offset + llama_model_n_nextn_layer API offset head select + require all MTP blocks speculative multi-head process() speculative multi-head draft()…

6
llama.cpp releases dev-tools 9d ago

b9744

common/peg : refactor until gbnf grammar generation ( #24839 ) common/peg : refactor until gbnf grammar into an ac automaton cont : add a test with multiple strings cont : pad state with 0s so rules line up cont : clean up comments cont : use set everywhere cont : inline state…

4
llama.cpp releases dev-tools 9d ago

b9743

common/json-schema-to-grammar : align spacing rules with parsers ( #24835 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

11
llama.cpp releases dev-tools 9d ago

b9742

fix(hexagon): use padded stride for ssm-conv weights ( #24470 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

4
llama.cpp releases dev-tools 9d ago

b9741

llama : use LLM_KV for quantization_version & file_type ( #24802 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

27
llama.cpp releases dev-tools 9d ago

b9740

arg: try fixing test-args-parser randomly fails ( #24826 ) arg: try fixing test-args-parser randomly fails return ref try triggering the workflow exception wrapper wip test test 2 arg: guard win32 utf8 argv override make_utf8_argv rebuilds argv from GetCommandLineW to fix utf8…

8
llama.cpp releases dev-tools 9d ago

b9739

release: add missing link for win opencl adreno arm64 ( #24809 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

33
llama.cpp releases dev-tools 9d ago

b9738

server: avoid forwarding auth headers in CORS proxy ( #24373 ) server: avoid forwarding auth headers in CORS proxy format fix test fix e2e test Co-authored-by: Xuan Son Nguyen [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

19
llama.cpp releases dev-tools 9d ago

b9736

model : glm-dsa load DSA indexer tensors as optional ( #24770 ) GLM-5.2 ships the DSA "lightning indexer" on only a subset of layers (the "full" layers; others omit it), but the GLM_DSA loader created the five indexer tensors on every layer as required, so loading any GLM-5.2…

14
llama.cpp releases dev-tools 9d ago

b9735

ggml : optimize AMX ( #24806 ) Flatten the partition over n_batch * M so every thread participates in the quantization | CPU | Model | Test | t/s OLD | t/s NEW | Speedup |…

37
llama.cpp releases dev-tools 9d ago

b9737

docker : prebuild web UI for s390x build [no release] ( #24829 )

31
llama.cpp releases dev-tools 10d ago

b9733

ggml-webgpu: add adapter toggles for F16 on Vulkan + NVIDIA macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

11
llama.cpp releases dev-tools 10d ago

b9732

server: refactor child --> router communication ( #24821 ) server: refactor child --> router communication fix wakeup case add docs improve update_status() nits macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…

13
llama.cpp releases dev-tools 10d ago

b9731

server : optimize get_token_probabilities ( #24796 ) Use std::partial_sort to order only the requested top-n tokens instead of the full vocabulary logprobs sort: vocab=128000 n_top=0 iters=100 full sort: 8555.6 us/op partial sort: 704.3 us/op Signed-off-by: Adrien Gallouët…

37
llama.cpp releases dev-tools 10d ago

b9730

mtmd, arg: fix utf8 handling on windows ( #24779 ) mtmd, arg: fix utf8 handling on windows also fix ggml_fopen fix build fail also fix CLI macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…

36
llama.cpp releases dev-tools 10d ago

b9729

server: remove all internal mentions about "webui" ( #24817 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

32
llama.cpp releases dev-tools 10d ago

b9728

arg: Add comment line support to --api-key-file ( #23168 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

25
llama.cpp releases dev-tools 10d ago

b9727

vendor : update cpp-httplib to 0.48.0 ( #24787 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

38
llama.cpp releases dev-tools 10d ago

b9726

server: add --agent arg, remove redundant webui naming compat ( #24801 ) server: add --agent arg, remove redundant webui naming compat corrent env fix the test llama-gen-docs nits: wordings macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

10
llama.cpp releases dev-tools 10d ago

b9725: docker : build the UI (#24794)

docker : build the UI cont : use existing APP_VERSION

5
llama.cpp releases dev-tools 10d ago

b9724

mtmd: several bug fixes ( #24784 ) mtmd: several bug fixes fix build fix gemma4ua add sanity check in get_u32() fix build (2) area() avoid overflow macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…

27
llama.cpp releases dev-tools 10d ago

b9723

spec: support eagle3 for qwen3.5 & 3.6 ( #24593 ) spec: support qwen3.5 & 3.6 eagle3 draft eagle3: Add deferred boundary checkpoints restore support for hybrid models apply suggestions Co-authored-by: Georgi Gerganov [email protected] spec: adapt to API change spec: fix naming…

21
llama.cpp releases dev-tools 10d ago

b9722

server: fix non-bound n_discard value (ctx shifting) ( #24786 ) server: fix non-bound n_discard value Update tools/server/server-context.cpp Co-authored-by: Georgi Gerganov [email protected] Co-authored-by: Georgi Gerganov [email protected] macOS/iOS: macOS Apple Silicon…

36
llama.cpp releases dev-tools 10d ago

b9721

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…

31
llama.cpp releases dev-tools 10d ago

b9718

server : consolidate slot selection into get_available_slot ( #24755 ) Absorb get_slot_by_id logic into get_available_slot so slot selection is handled by a single function call. When a specific slot id is requested, the LCP similarity check still runs to enable proper prompt…

25
llama.cpp releases dev-tools 11d ago

b9717

ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul ( #24753 ) ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass…

38
llama.cpp releases dev-tools 11d ago

b9716

mtmd: add batching support for internvl ( #24775 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

16
llama.cpp releases dev-tools 11d ago

b9715

Ggml/cuda col2im 1d ( #24417 ) cuda: add GGML_OP_COL2IM_1D, follow-up to the CPU op cuda: col2im_1d use fast_div_modulo for the index decomposition cuda: col2im_1d tighten supports_op, type match and contiguous dst macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

30
llama.cpp releases dev-tools 11d ago

b9714

server: add "X-Accel-Buffering": "no" header to streaming endpoints ( #24774 ) server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints) Without it, Nginx will…

11
llama.cpp releases dev-tools 11d ago

b9713

mtmd: add batching for mtmd-cli, add video tests ( #24778 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

22
llama.cpp releases dev-tools 11d ago

b9712

cmake : fix ui build with read-only source ( #24752 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

4
llama.cpp releases dev-tools 11d ago

b9711

mtmd: refactor llava-uhd overview image handling (always use ov_img_first) ( #24769 ) add dedicated "overview" for mtmd_image_preproc_out corrections correct (again) nits nits (2) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…

14
llama.cpp releases dev-tools 11d ago

b9707

server: add "schema" and validation ( #24150 ) wip working correct some limits add field name to error message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

5
llama.cpp releases dev-tools 11d ago

b9704

server : return HTTP 400 on invalid grammar ( #24144 ) ( #24154 ) Throw on grammar parse failure so the server returns HTTP 400 instead of silently dropping the constraint. Add a regression test for the invalid-grammar response. Fixes #24144 macOS/iOS: macOS Apple Silicon…

26
llama.cpp releases dev-tools 11d ago

b9703

server: (router) rework -hf preset repo ( #24739 ) server: temporary remove HF remote preset rework remove preset.ini support rm unused get_remote_preset_whitelist() print warning add docs rm stray file macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

19
llama.cpp releases dev-tools 11d ago

b9702

server: fix router args not being forwarded to child instances ( #24760 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

14

Page 2 of 10 · 454 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *