Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

455 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 1mo ago

b9313

ggml : Parallelize quant LUT init ( #23595 ) Use OpenMP to parallelize iq2xs_init_impl and iq3xs_init_impl. Move the OpenMP detection from ggml-cpu to ggml-base. Update OpenMP dependencies in ggml-config.cmake.in. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

28
llama.cpp releases dev-tools 1mo ago

b9311

vendor : update cpp-httplib to 0.45.1 ( #23639 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

25
llama.cpp releases dev-tools 1mo ago

b9310

server: fix checkpoints creation ( #22929 ) common : add common_chat_split_by_role cont : fix spans to reach end of message server: fix checkpoints creation extract message_spans from chat templates find the prompt token position before the latest user message split prompt…

36
llama.cpp releases dev-tools 1mo ago

b9309: perplexity : fix even more integer overflows (#23623)

Co-authored-by: Stanisław Szymczyk [email protected]

30
llama.cpp releases dev-tools 1mo ago

b9305

cmake : fix ui build ( #23592 ) cmake/ui : add -fPIC to llama-ui static lib cmake : rename host compiled embed helper macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

29
llama.cpp releases dev-tools 1mo ago

b9301

hexagon: apply repl optimization in flash attn softmax as #22993 ( #23 …

27
llama.cpp releases dev-tools 1mo ago

b9297

model : add NVFP4 MTP scale tensors ( #23563 ) Add NVFP4 MTP scale tensors Link Qwen3.5 MTP tensors Aligned nullptr macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

11
llama.cpp releases dev-tools 1mo ago

b9296

ggml : Check the right iface method before using the fallback 2d get ( #23514 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

9
llama.cpp releases dev-tools 1mo ago

b9295

vulkan: fix windows find_package of SPIRV-Headers ( #23215 ) vulkan: fix windows find_package of SPIRV-Headers not windows-only macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

17
llama.cpp releases dev-tools 1mo ago

b9294

opencl: generalize Adreno MoE kernels on M ( #23449 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

17
llama.cpp releases dev-tools 1mo ago

b9291

SYCL: improve MoE prefill throughput ( #23142 ) change k_copy_src1_to_contiguous so that uses a precomputed contiguous mapping where all rows "owned" by an expert are in one slice with a know starts and ends switch the O(n_as * n_routed_rows) contraption to a counting sort-based…

27
llama.cpp releases dev-tools 1mo ago

b9292

perplexity : fix integer overflow ( #23496 ) Co-authored-by: Stanisław Szymczyk [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

13
llama.cpp releases dev-tools 1mo ago

b9290

sycl : Level Zero detection in ggml_sycl_init ( #23097 ) [SYCL] Centralize Level Zero detection in ggml_sycl_init use the same wording get back the warning macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework…

15
llama.cpp releases dev-tools 1mo ago

b9289

SYCL : gated_delta_net K>1 ( #23174 ) sycl_gated_delta_net K>1 editor_config macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

34
llama.cpp releases dev-tools 1mo ago

b9286

ggml-zendnn : add Q8_0 quantization support ( #23414 ) ggml-zendnn : add Q8_0 quantization support ggml-zendnn : sync with latest ZenDNN ggml-zendnn : address review comments for Q8_0 macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…

15
llama.cpp releases dev-tools 1mo ago

b9285

cmake : build router app only during standalone builds ( #23521 ) Co-authored-by: Stanisław Szymczyk [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

17
llama.cpp releases dev-tools 1mo ago

b9284

vocab : fix HybridDNA tokenizer ( #23466 ) vocab : mark hybriddna k-mers to avoid BPE token collisions improved loop Co-authored-by: Sigbjørn Skjæret [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel…

9
llama.cpp releases dev-tools 1mo ago

b9283

cmake : add install() for impl libraries + fix apple builds ( #23511 ) pi : update ci : fix ios build ci : fix andoroid ci : fix apple builds cmake : add install() for impl libraries Add install(TARGETS LIBRARY) for all -impl libraries that were changed from STATIC to shared…

14
llama.cpp releases dev-tools 1mo ago

b9279

vulkan: fuse snake activation (mul, sin, sqr, mul, add) ( #22855 ) vulkan: fuse snake activation (mul, sin, sqr, mul, add) Add snake.comp shader with F32 / F16 / BF16 pipelines and ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op decomposition emitted by audio…

23
llama.cpp releases dev-tools 1mo ago

b9277

tests : move save-load-state from examples to tests ( #23336 ) tests : move save-load-state from examples to tests Move examples/save-load-state/ to tests/test-save-load-state.cpp Remove subdirectory reference from examples/CMakeLists.txt Add test to tests/CMakeLists.txt as a…

25
llama.cpp releases dev-tools 1mo ago

b9276

server: expose prompt token counts in /slots endpoint ( #23454 ) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor…

15
llama.cpp releases dev-tools 1mo ago

b9275

metal : optimize concat kernel and fix set kernel threads ( #23411 ) metal : fix GGML_OP_SET kernel threads tests : extend test_cpy to support different src/dst shapes Extend test_cpy to support different source and destination tensor shapes for CPY operations (reshaping), where…

37
llama.cpp releases dev-tools 1mo ago

b9274

server : free draft/MTP resources on sleep to fix VRAM leak ( #23461 ) The destroy() function in server_context_impl only cleaned up the main model and context (via llama_init.reset()) but did not free the speculative decoder (spec), draft context (ctx_dft), or draft model…

22
llama.cpp releases dev-tools 1mo ago

b9282

CUDA: fix PDL CC check for JIT compilation ( #23471 )

31
llama.cpp releases dev-tools 1mo ago

b9273

server: re-inject subcommand when router spawns children under unified binary ( #23442 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

32
llama.cpp releases dev-tools 1mo ago

b9272

app : add batched-bench, fit-params, quantize & perplexity ( #23459 ) app : add batched-bench, fit-params, quantize & perplexity Signed-off-by: Adrien Gallouët [email protected] Add missing main.cpp Signed-off-by: Adrien Gallouët [email protected] Add EOL Signed-off-by:…

37
llama.cpp releases dev-tools 1mo ago

b9271

mtp: use inp_out_ids for skipping logit computation ( #23433 ) when doing a follow-up decode for the draft model, we were always doing the logit computation even though it is not required. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS…

23
llama.cpp releases dev-tools 1mo ago

b9270

vocab : add Carbon-3B (HybridDNATokenizer) support ( #23410 ) vocab : add Carbon-3B (HybridDNATokenizer) support Adds a new BPE pre-type LLAMA_VOCAB_PRE_TYPE_CARBON for the HybridDNATokenizer used by HuggingFaceBio/Carbon-{500M,3B,8B}. The base BPE is Qwen3-4B-Base's; what…

11
llama.cpp releases dev-tools 1mo ago

b9267

ggml : Check the right iface method before using the fallback 2d get ( #23306 ) Probably no backends implement only one of 2d get/set, but this might be annoying for some future backend developer trying to add 2d get/set. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

11
llama.cpp releases dev-tools 1mo ago

b9266

llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models ( #23131 ) When a model has zero non-SWA attention layers (e.g. a SWA-only slice of Gemma 4), the base KV cache has no layer tensors. The input tensors (self_k_idxs, self_v_idxs, self_kq_mask)…

13
llama.cpp releases dev-tools 1mo ago

b9264

app : show version ( #23426 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

26
llama.cpp releases dev-tools 1mo ago

b9265: hexagon: ssm-conv fix for large prompts (#23307)

hexagon: remove gathers and better handling of vtcm in ssm-conv hexagon: relax ssm-conv gating requirements hexagon: add new prefill ssm-conv backend test hexagon: remove trailing white space hex-rope: uninline rope_cache_init, otherwise it breaks after rebaseing with SSM_CONV…

34
llama.cpp releases dev-tools 1mo ago

b9263

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision ( #23329 ) HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference. Collapse OCR into the…

12
llama.cpp releases dev-tools 1mo ago

b9260

opencl: refactor backend initilization ( #23318 ) opencl: refactor initialization opencl: refactor GPU identification opencl: rename for consistency opencl: cache global mem size in dev_ctx opencl: adjust log level opencl: load argsort and flash_attn kernels in supports_op…

7
llama.cpp releases dev-tools 1mo ago

b9259

common/speculative : fix nullptr crash in get_devices_str ( #23386 ) ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi macOS/iOS: macOS…

20
llama.cpp releases dev-tools 1mo ago

b9258

mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor ( #23345 ) mtmd : deepseek-ocr fixes, improvements and refactoring image processing changes to achieve full parity with Pillow (reference impl) SAM mask casting only when flash-attn is on SAM refactor…

24
llama.cpp releases dev-tools 1mo ago

b9257

vulkan: optimize operations in the IM2COL shader ( #22685 ) vulkan: optimize operations in the IM2COL shader Add comments and improve the code formatting macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux:…

23
llama.cpp releases dev-tools 1mo ago

b9255

hexagon: HMX quantized matmul rework ( #23368 ) hmx-mm: update debug logging in hmx-mm hmx-mm: update dequant logic to use HVX_vector_x2/4 hmx-mm: remove non-pipelined version of the quantize matmul It seems that we don't reall need non-pipelined version hmx-mm: use activation…

36
llama.cpp releases dev-tools 1mo ago

b9254

Programmatic Dependent Launch (PDL) for more performance on newer NVIDIA GPUs (Hopper+) ( #22522 ) Adds initial PDL setup. Adds PDL barriers based on simple heuristic: place "sync" before first input pointer access, and "launch" after last write, e.g. to tensors like dst.…

17
llama.cpp releases dev-tools 1mo ago

b9253

app : introduce the llama unified executable ( #23296 ) app : introduce the llama unified executable Signed-off-by: Adrien Gallouët [email protected] Use serve for server Signed-off-by: Adrien Gallouët [email protected] Hide completion and bench, add help command…

26
llama.cpp releases dev-tools 1mo ago

b9251

mtmd: fit_params now take into account mmproj ( #21489 ) mtmd: fit_params now take into account mmproj rename alloc_compute_meta to reserve_compute_meta rm unused functions add ggml_backend_dev_t support add debug log macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

23
llama.cpp releases dev-tools 1mo ago

b9247

metal : optimize pad + cpy ( #23354 ) metal : optimize pad metal : optinmize cpy cont : better row packing in threadgroup macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

29
llama.cpp releases dev-tools 1mo ago

b9246: snapdragon: update toolchain to v0.6 (#23369)

snapdragon: update compiler flags to enable all CPU features snapdragon: update readme to point to toolchain v0.6 snapdragon: bump toolchain docker to v0.6

37
llama.cpp releases dev-tools 1mo ago

b9245

ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps ( #23349 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

35
llama.cpp releases dev-tools 1mo ago

b9244

opencl: add MoE support for q4_k, q5_k, q6_k on Adreno ( #23303 ) opencl: add q4_k moe support opencl: add q5_k moe support opencl: add q6_k moe support opencl: adjust format Co-authored-by: Li He [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

33
llama.cpp releases dev-tools 1mo ago

b9243

hexagon: add MROPE and IMROPE support in HTP rope op ( #23317 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

18
llama.cpp releases dev-tools 1mo ago

b9235

llama : MTP clean-up ( #23269 ) llama : disable equal splits for recurrent memory with partial rollback spec : re-enable p-min with MTP drafts spec : re-enable ngram spec in combination with RS rollback spec : fix ngram-map-* params spec : fix acceptance logic in combined ngram…

27
llama.cpp releases dev-tools 1mo ago

b9240

common: fix --help for --verbosity ( #23278 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

5
llama.cpp releases dev-tools 1mo ago

b9239

common: fix --fit verbosity with --verbosity 4 ( #23282 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

35
llama.cpp releases dev-tools 1mo ago

b9222

hexagon: add support for TRI op ( #22822 ) Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context addressed PR review comments for TRI op hexagon: clang format hex-unary: remove merge conflict markers hex-ggml: remove duplicate op cases (merge conflict) hex-ggml:…

36

Page 8 of 10 · 455 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *