Home Status News MCP Pricing Sign in

Home Status News MCP Pricing Sign in

News / llama.cpp releases

llama.cpp releases

454 articles archived · Visit source ↗ · RSS

Sign in to subscribe

llama.cpp releases dev-tools 15d ago

b9632

jinja : add count/d/e filter aliases ( #24606 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan)…

9
llama.cpp releases dev-tools 15d ago

b9631

cli : fix not copying preserved tokens ( #24258 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

6
llama.cpp releases dev-tools 16d ago

b9630

Add cohere2moe to llama-vocab for TINY_AYA ( #24601 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

16
llama.cpp releases dev-tools 16d ago

b9628

add sycl to check-release ( #24583 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

21
llama.cpp releases dev-tools 16d ago

b9627

ui : fix llama-ui-embed crash when no asset dir is given ( #24597 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

25
llama.cpp releases dev-tools 16d ago

b9626

Add arch support for cohere2-MoE ( #24260 ) Add arch support for cohere2-MoE Removed redundant gating_func checks Changed ffn lookup to prefer prefix_dense_intermediate_size Renamed arch to cohere2moe Removed redundant lmhead check and chat template changes Removed…

36
llama.cpp releases dev-tools 16d ago

b9625

jinja : fix negative step slice with start/stop values ( #24580 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

27
llama.cpp releases dev-tools 16d ago

b9624

ui: build-time gzip compression ( #24571 ) ui: keep original file name and path fix nocache ui: build-time gzip compression macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

19
llama.cpp releases dev-tools 16d ago

b9623

jinja : fix split and replace with empty first arg ( #24574 ) fix split and replace with empty first arg fix reserve size macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

13
llama.cpp releases dev-tools 16d ago

b9622

vulkan: support non-contig unary/glu ops ( #24215 ) vulkan: support non-contig unary/glu ops Change unary/glu ops to pass in all strides and use fastdiv for the index calculation. Put all unary ops in one file, similar to glu, to share the code. codex went ahead and added expm1…

15
llama.cpp releases dev-tools 16d ago

b9621

ui: keep original file name and path ( #24568 ) ui: keep original file name and path fix nocache macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

35
llama.cpp releases dev-tools 16d ago

b9620

server: clean up static assets handling ( #24550 ) server: clean up static assets handling nits simplify file name handling, use static file name everywhere cmake/ui : bundle UI assets in an archive ui : run prettier on post-build.js Co-authored-by: Alde Rojas [email protected]…

12
llama.cpp releases dev-tools 17d ago

b9619

fit : wrap llama_device_memory_data ( #24522 )

22
llama.cpp releases dev-tools 17d ago

b9616

ci : unbreak release harder ( #24545 ) unbreak release harder missed one remove missing test for now macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

29
llama.cpp releases dev-tools 17d ago

b9611

fit : avoid including llama-ext.h in fit.h ( #24506 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

28
llama.cpp releases dev-tools 17d ago

b9610

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…

22
llama.cpp releases dev-tools 17d ago

b9608

vendor : update cpp-httplib to 0.47.0 ( #24395 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

13
llama.cpp releases dev-tools 17d ago

b9606

spec: add EAGLE3 speculative decoding support ( #18039 ) llama : enable layer input extraction spec: support eagle3 eagle3: fix params bug eagle3: support Gemma4 eagle3 from RedHatAI eagle3: set sync when get features from target Co-authored-by: tnhnyzc…

24
llama.cpp releases dev-tools 18d ago

b9605

ggml: support concat for scalar types at cuda backend ( #24011 ) cuda: support concat for scalar types Update concat.cu fix metal ci issue macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux:…

20
llama.cpp releases dev-tools 18d ago

b9604

[SYCL] Fix CI build & release for SYCL backend ( #24387 ) restore SYCL build and release, remove github cache modify for test only verify the ccache is used remove debug code change rm duplicate action, update key in ccache add action ccache-clear after building in both ubuntu…

21
llama.cpp releases dev-tools 18d ago

b9603

opencl: add q5_0/q5_1 gemm and gemv kernels for Adreno ( #24319 ) opencl: add q5_0 adreno support opencl: add q5_1 adreno support opencl: cosmetic fix Co-authored-by: Li He [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

13
llama.cpp releases dev-tools 18d ago

b9601

vulkan: ifdef eMesaHoneykrisp (build fix) ( #24479 ) Fixes build/CI after #24306 . macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu…

13
llama.cpp releases dev-tools 18d ago

b9596

server: skip unused log lines on router mode ( #24463 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

32
llama.cpp releases dev-tools 19d ago

b9594

vocab : refactor normalizer flags into options struct, add strip_accents ( #24371 ) vocab : refactor normalizer flags into options struct, add strip_accents Update src/llama-vocab.h Co-authored-by: Sigbjørn Skjæret [email protected] Update src/llama-vocab.cpp…

27
llama.cpp releases dev-tools 19d ago

b9592

vendor : update LibreSSL to 4.3.2 ( #24397 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…

35
llama.cpp releases dev-tools 19d ago

b9591

Remove padding and multiple D2D copies for MTP ( #24086 ) Make ggml_gated_delta_net take only the initial recurrent state (D, 1, n_seqs) and passes the snapshot count K as an op parameter instead of inferring it from state->ne[1]. Remove the padding hack and copy all emitted…

8
llama.cpp releases dev-tools 19d ago

b9590

chat: fix LFM2/LFM2.5 ignoring json_schema ( #24377 ) The LFM2 specialized template handler only built a grammar for tool-calling, silently ignoring json_schema from response_format. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED…

6
llama.cpp releases dev-tools 19d ago

b9589

CUDA: Fix ssm_scan_f32 data-races ( #24360 ) Add missing syncthreads before resuing cub_temp_storage __syncthreads() is required before being allowed to resue TempStorage smem:…

32
llama.cpp releases dev-tools 20d ago

b9587

speculative : fix "ngram-map-k4v" name in logging ( #24253 ) This is a non-functional change. When using --spec-type ngram-map-k4v , the log messages at startup and runtime say ngram-map-k . Added logic in the in the constructor of common_speculative_impl_ngram_map_k to pass the…

16
llama.cpp releases dev-tools 20d ago

b9586: webui: implement pinned conversations support (#21387)

webui: implement pinned conversations support webui: linter/prettier pass Fix the unused handleMobileSidebarItemClick from the component. the search should find pinned conversations as well Co-authored-by: Pascal [email protected] Co-authored-by: Pascal…

24
llama.cpp releases dev-tools 20d ago

b9585

graph: Fix granite speech model inference by applying embedding scale when deepstack is not used ( #24357 ) llama-graph : apply embedding scale when deepstack is not used nits: remove non-existant hunyuan-vl from the tests apply suggestion from @gabe-l-hart Co-authored-by: Xuan…

25
llama.cpp releases dev-tools 20d ago

b9584

ci : fix windows release ( #24369 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64…

22
llama.cpp releases dev-tools 20d ago

b9581

vulkan: reduce iq1 shared memory usage for mul_mm ( #24287 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

21
llama.cpp releases dev-tools 20d ago

b9580

vulkan: add v_dot2_f32_f16 support in matrix-matrix multiplication and Flash Attention ( #24123 ) vulkan: add support for valve fp16 dot2 extension use macro for dot2 path choice properly check for the feature add dot_product abstraction to reduce preprocessor branching…

10
llama.cpp releases dev-tools 20d ago

b9578

mtmd: refactor video subproc handling ( #24316 ) mtmd: refactor video subproc handling Update tools/mtmd/mtmd-helper.cpp Co-authored-by: Mikko Juola [email protected] Co-authored-by: Mikko Juola [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64,…

11
llama.cpp releases dev-tools 20d ago

b9577

server: log prompts to directory ( #22031 ) server: log prompts to directory Add --log-prompts-dir to write each prompt to a separate text file in the specified directory. Apply suggestion from @ngxson Co-authored-by: Xuan-Son Nguyen [email protected] macOS/iOS: macOS Apple…

35
llama.cpp releases dev-tools 20d ago

b9575

ggml : add GGML_OP_COL2IM_1D ( #24206 ) cpu: add GGML_OP_COL2IM_1D Add the overlap-add (scatter-add) step of a 1D transposed convolution. A ConvTranspose1d factorizes as a GEMM followed by col2im: a weight pre-permuted to [IC, K OC] is contracted against the [IC, T_in] input…

4
llama.cpp releases dev-tools 20d ago

b9574

server : do not clear slots without unified KV cache ( #24190 ) Always export idle slots to RAM Without this, a slot's VRAM cache may not be written to RAM. If this slot happens to be busy then later on, this triggers needless preprocessing in another slot. cont : clean-up…

33
llama.cpp releases dev-tools 20d ago

b9573

models : fix plamo2 attention_key/value_length regression ( #24317 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

15
llama.cpp releases dev-tools 21d ago

b9572

ggml-cpu : fix rms_norm_back wrong output under in-place aliasing ( #24305 ) ggml-cpu : fix rms_norm_back wrong output under in-place aliasing cont : clean-up comment Co-authored-by: Georgi Gerganov [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

27
llama.cpp releases dev-tools 21d ago

b9571

Remove case for GGML_TYPE_Q4_K in mvvq.cu ( #23528 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64…

7
llama.cpp releases dev-tools 21d ago

b9570

ggml-webgpu: Add clang-format job ( #24308 ) Add clang-format job try local formatting macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU)…

34
llama.cpp releases dev-tools 21d ago

b9568

mtp: support for gemma-4 E2B and E4B assistants ( #24282 ) models: update converter to support smaller assistants models: add masked_embd tensors to gemma4-assist arch gemma-4: remove temp debug for conversion gemma-4-mtp: filter out masked_embedding tensors during conversion…

23
llama.cpp releases dev-tools 21d ago

b9567

server : do not parse when flushing http headers ( #24281 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

26
llama.cpp releases dev-tools 21d ago

b9566

graph: guard iswa kq_mask on its own buffer ( #24294 ) A SWA-only draft head (e.g. StepFun MTP) leaves the base sub-cache empty, so its kq_mask buffer stays null and asserts at load. Guard each mask on its own buffer in set_input and can_reuse, base and swa. Co-authored-by:…

23
llama.cpp releases dev-tools 21d ago

b9565

[ggml-webgpu] Handle buffer overlap / buffer aliasing for concat operator ( #24000 ) Only run webgpu CI on my fork Add webgpu only workflow handle buffer overlap case for concat operator restore build-webgpu.yml Co-Authored-By: Claude Sonnet 4.6 [email protected] Run…

14
llama.cpp releases dev-tools 21d ago

b9564

[ggml-webgpu] Implement 2D workgroups for scale, binary, and unary ops ( #24044 ) Only run webgpu CI on my fork Add webgpu only workflow Implement 2d workgroups for more operations fix Fix type Move back to global_invocation_id macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

24
llama.cpp releases dev-tools 21d ago

b9562

mtmd : add video input support ( #24269 ) wip ok: lazy bitmap API remember to free lazy text wip add mtmd_helper_video support video input on server (base64 input) add MTMD_VIDEO config add timestamp update CLI cli: allow auto-completion for video add --video arg fix build…

22
llama.cpp releases dev-tools 21d ago

b9561

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64…

13
llama.cpp releases dev-tools 21d ago

b9559

cli: fix spinner not show during prompt processing ( #24283 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

10

Page 4 of 10 · 454 articles ← Newer Older →

Product

Pricing
Roadmap
Changelog
Incidents

Resources

News RSS
MCP RSS
MCP releases RSS
Incidents RSS
Changelog RSS

Project

About
API
Contact

Legal

Privacy
Terms
Security

Prismix · © 2026 · AI Hub

All product names and logos are trademarks of their respective owners.

Send feedback

Name (optional)

Email *

Message *