llama.cpp releases · May 25, 2026 · 1 min read

b9319

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

Like Read original ↗

ggml: gguf_init_from_callback and gguf_init_from_buffer (#22341)

ggml: implement gguf_init_from_buffer
test: gguf_init_from_buffer
fix: memory breakdown for a model loaded with no_alloc from a file is consistent with being loaded from a buffer
fix: use GGML_UNUSED

Co-authored-by: Copilot [email protected]

fix: remove total_size from gguf_reader
fix: file offset calculation, rename offset to data_offset

Co-authored-by: Copilot [email protected]

refactor: extract model loader bug fixes to another PR
feat: add gguf_init_from_callback
fix: always require a max expected size
fix: change gguf_reader_callback_t's output type to void *, change max_expected_size and offsets to uint64_t
fix: harden against offset overflow in buffer read
fix: remove seek behavior from the callback
feat: max_chunk_read == 0 means SIZE_MAX
fix: seeking in a gguf file with no tensors

Co-authored-by: Copilot [email protected]

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

UI:

UI

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases