Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

455 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 26d ago

b9489

cuda: reserve space for quantize kv-cache at startup ( #23907 ) cuda: reserve space for quantize kv-cache at startup address review comments remove forward decl Co-authored-by: Johannes Gäßler [email protected] remove assert in ggml-cuda.cu Co-authored-by: Johannes Gäßler…

25
llama.cpp releases dev-tools 27d ago

b9488

tests : add support for qwen3 SSM archs ( #24031 ) tests : add support for qwen3 SSM archs arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS cont : naming + TODOs macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…

24
llama.cpp releases dev-tools 27d ago

b9486

ci : disable ccache for msvc windows release jobs ( #23911 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

6
llama.cpp releases dev-tools 27d ago

b9487

update BoringSSL to 0.20260526.0 ( #23794 )

28
llama.cpp releases dev-tools 27d ago

b9485

arg : removed unecesary mmproj download when users pass --no-mmproj ( #23425 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

38
llama.cpp releases dev-tools 27d ago

b9484

opencl: use flat variants of q4_K and q6_K gemv for very large M ( #24006 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

6
llama.cpp releases dev-tools 27d ago

b9483

hexagon: profiler output fix and script updates ( #24042 ) hex-ops: fix profiler output (ie remove the redundant NONEs) hex-prof: update profiling script to support tot.usec column macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…

26
llama.cpp releases dev-tools 27d ago

b9482

model: add Mellum architecture ( #23966 ) model: support for Mellum architecture model: improve mellum.py formatting model: improve mellum.py formatting once again deps: downgrade transformers to 4.57.6 (to fix CI) deps: remove huggingface_hub dependency deps: remove…

13
llama.cpp releases dev-tools 27d ago

b9481

model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) ( #22716 ) Add support for the ibm-granite/granite-embedding-{97m,311m}-multilingual-r2 embedding models: Added a version of the gpt4o tokenizer that has a fixed regex…

22
llama.cpp releases dev-tools 27d ago

b9480

StepFun 3.5 MTP ( #23274 ) StepFun 3.5 MTP Simplify to single layer Rollback core changes fix flake8 errors Remove scripts modify to convention Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret [email protected] dos2unix Co-authored-by: Sigbjørn…

14
llama.cpp releases dev-tools 27d ago

b9479

common : fix state save in common_prompt_batch_decode ( #23468 ) common : fix state save in common_prompt_batch_decode This commit addresses a bug in common_prompt_batch_decode that affects the session state store/restore in completion.cpp and save-load-state.cpp. The motivation…

10
llama.cpp releases dev-tools 27d ago

b9478

server: add SSE ping interval ( #24013 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu…

17
llama.cpp releases dev-tools 27d ago

b9474

ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI ( #23434 ) feat: Add "Thinking" toggle and status icon + redesign Chat Form Actions Add panel test: Update test reference fix: Icon fix: E2E test command fix: wait for greeting…

5
llama.cpp releases dev-tools 27d ago

b9473

kv-cache : SWA checkpoints store only non-masked cells ( #23981 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

37
llama.cpp releases dev-tools 28d ago

b9471

llama : deprecate llama_set_warmup ( #24009 ) llama : deprecate llama_set_warmup cont : fix type Co-authored-by: Daniel Bevenius [email protected] Co-authored-by: Daniel Bevenius [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

19
llama.cpp releases dev-tools 28d ago

b9470

hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimizations for latest models ( #23989 ) hex-mm: initial support for F32 * F32 -> F32 matmuls hex-rms-norm: fix src1 stride use in fused rms_norm_mul hex-ops: clear spad pointers in the ops that clober it This fixes…

10
llama.cpp releases dev-tools 28d ago

b9469

hexagon: add gelu_quick ( #24007 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

37
llama.cpp releases dev-tools 28d ago

b9468

server: real-time reasoning interruption via control endpoint ( #23971 ) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949 . Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls…

17
llama.cpp releases dev-tools 28d ago

b9467

clean up unused variables warnings ( #23975 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

14
llama.cpp releases dev-tools 28d ago

b9466

opencl: fix compiler warnings for non-adreno path ( #23922 ) opencl: fix compiler warnings for non-adreno path opencl: fix const cast warning macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…

31
llama.cpp releases dev-tools 28d ago

b9464

speculative : fix n_outputs_max and remove draft-simple auto-enable ( #23988 ) speculative : add common_speculative_n_max helper function Extract the speculative max-draft-size logic from server_n_outputs_max into a reusable common_speculative_n_max() function in…

7
llama.cpp releases dev-tools 28d ago

b9460

llama: limit max outputs of llama_context ( #23861 ) llama: save more VRAM by reserving n_outputs == n_seqs when possible add n_outputs_per_seq move n_outputs_max to server-context change ubatch to batch everywhere macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

15
llama.cpp releases dev-tools 28d ago

b9459

metal: template GLU kernels to support f16/f32 ( #23882 ) Drops the hardcoded f32 GLU kernels in favor of a single template. We now load/store in the native tensor type (half or float) to save memory bandwidth, but keep the actual ALU compute in float to avoid exploding math in…

35
llama.cpp releases dev-tools 28d ago

b9458

vulkan: don't hold the device mutex while compiling pipelines ( #23641 ) vulkan: don't hold the device mutex while compiling pipelines We need to hold a lock while we traverse all pipelines and lazily initialize them, but we don't need to hold it while the pipeline is being…

37
llama.cpp releases dev-tools 28d ago

b9457

vulkan: reduce host memory lock contention ( #23376 ) vulkan: reduces lock contention replace unique_lock with lock_guard macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

26
llama.cpp releases dev-tools 28d ago

b9455

TP: quantized KV cache support ( #23792 ) TP: quantized KV cache support fix partial view remove overly strict assert macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

15
llama.cpp releases dev-tools 28d ago

b9453

model: Add EXAONE 4.5 implementations ( #21733 ) Add EXAONE 4.5 and Add GQA for MMproj mtmd: EXAONE 4.5 vision markers and projector path EXAONE 4.5 uses and for image boundaries; Qwen keeps <|vision_start|> and <|vision_end|>. Route EXAONE 4.5 through the Qwen2.5-VL-style…

32
llama.cpp releases dev-tools 28d ago

b9452

vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints ( #23056 ) Q2_K/Q3_K/Q6_K do much better when using MMVQ on Intel BMG even though they're only 2-byte aligned, and Q3_K still wins on NVIDIA as well. mesa isn't all that great at coalescing back-to-back loads from…

4
llama.cpp releases dev-tools 28d ago

b9451

vulkan: Removed unused functions ( #23175 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

37
llama.cpp releases dev-tools 29d ago

b9445: ci: remove redundant or duplicate jobs (#23927)

remove redundant apple job openvino gpu and cpu test can share the same build and machine Update build-rpc.yml Update build-openvino.yml cpu any doesnt make sense as we have an arm job already, so do high perf on both x86 and arm remove duplicate x86 vulkan combine backend…

31
llama.cpp releases dev-tools 29d ago

b9444

server : handle If-None-Match weak ETags ( #23916 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

12
llama.cpp releases dev-tools 29d ago

b9442

vocab : add tokenizer support for jina-embeddings-v2-base-zh ( #18756 ) vocab : add jina-embeddings-v2-base-zh (whitespace tokenizer) lowercase defaults to true type fix Co-authored-by: Sigbjørn Skjæret [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS…

12
llama.cpp releases dev-tools 1mo ago

b9441

ui: fix ETag truncation with MSVC compiler ( #23917 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

20
llama.cpp releases dev-tools 1mo ago

b9439

llama: only use one iGPU device by default ( #23897 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

4
llama.cpp releases dev-tools 1mo ago

b9438: webui: add custom CSS injection via config (#23904)

webui: add custom CSS injection via config register a customCSS setting in the Developer section under Custom JSON, syncable so it rides the existing ui-config pass through. inject the value into a single style element in the head, reactive on the setting. lets an operator theme…

28
llama.cpp releases dev-tools 1mo ago

b9437

Support -fa auto in llama-bench ( #23714 ) Support -fa auto in llama-bench Make the default value of -ngl -1, similar to other tools. Update README with latest usage and examples Address review comments macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

16
llama.cpp releases dev-tools 1mo ago

b9436

opencl: support bf16 by converting to f16 ( #23839 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

18
llama.cpp releases dev-tools 1mo ago

b9434

TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs ( #23843 ) TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs fix afmoe TP macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

25
llama.cpp releases dev-tools 1mo ago

b9433

metal : restore im2col implementation for large kernels ( #23901 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

31
llama.cpp releases dev-tools 1mo ago

b9432

test: (test-llama-archs) log the config name first ( #23885 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

37
llama.cpp releases dev-tools 1mo ago

b9431

ci : update ios-xcode release job to macos-26 ( #23906 ) ci : disable libcommon build from xcframework ocd : fix name ci : ios-xcode change to macos-26 cont : pin xcode cont : pin xcode to minor version macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI…

34
llama.cpp releases dev-tools 1mo ago

b9430

ggml : add some lsx support ( #23798 ) loongarch : optimize LSX fp16 load/store with native intrinsics Use __lsx_vfcvtl_s_h and __lsx_vfcvt_h_s instead of scalar loops in __lsx_f16x4_load and __lsx_f16x4_store. loongarch : add LSX implementation for q8_0 dot product loongarch :…

21
llama.cpp releases dev-tools 1mo ago

b9428

ci : fix s390x release job ( #23898 ) ci : fix s390x release job ci : multi-thread build for ios-xcode ocd : names macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

6
llama.cpp releases dev-tools 1mo ago

b9426: llama : do not skip iGPU when only RPC devices are present (#23868)

After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made model->devices non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128…

12
llama.cpp releases dev-tools 1mo ago

b9415

download: add option to skip_download ( #23059 ) download: add option to skip_download fix fix 2 if file doesn't exist, respect skip_download flag macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework…

16
llama.cpp releases dev-tools 1mo ago

b9414

mtmd: Add DeepSeekOCR 2 Support ( #20975 ) mtmd: DeepSeek-OCR 2 support, with multi-tile dynamic resolution introduced clip_image_f32::add_viewsep address PR review drop redundant ggml_cpy ops in both deepseekocr versions build drop no-op ggml_cont in build_sam assert…

30
llama.cpp releases dev-tools 1mo ago

b9413

CUDA: Check PTX version on host side to guard PDL dispatch ( #23530 ) CUDA: Check PTX version on host side to guard PDL dispatch Checking on __CUDA_ARCH_LIST__ alone is insufficient for JIT, as this variable doesn't differentiate between compiling for say sm_90, sm_90a or sm_90f…

26
llama.cpp releases dev-tools 1mo ago

b9412

server: bump timeout to 3600s ( #23842 ) server: bump timeout to 3600s nits: change wording macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…

32
llama.cpp releases dev-tools 1mo ago

b9411

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation ( #23346 ) llama : support DeepSeek V3.2 model family (with DSA lightning indexer) convert : handle DeepseekV32ForCausalLM architecture ggml : support for f16 GGML_OP_FILL…

34
llama.cpp releases dev-tools 1mo ago

b9410

llama: use f16 mask for FA to save VRAM ( #23764 ) llama: use f16 mask for FA review: add llama_cast + formatting simplify macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

7

Page 6 of 10 · 455 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *