llama.cpp releases · May 13, 2026 · 1 min read

b9133

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

server, webui: support continue generation on reasoning models (#22727)

server, webui : support continue generation on reasoning models (#22727)

Remove the throw blocking assistant prefill on reasoning models and
orchestrate thinking tags around the prefilled message so the parser
routes the next stream chunks correctly. WebUI drops the reasoning
guard on the Continue button, sends reasoning_content with the
prefilled message and persists partial reasoning on stop so the CoT
survives reload and resume.

Scope : templates with a simple thinking_start_tag / thinking_end_tag
pair. Channel-based templates like GPT-OSS are out of scope, pending
a per-template prefill API in common/chat.

First step toward #21754.

chore: update webui build output
server: reject reasoning prefill on channel based templates

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from llama.cpp releases