llama.cpp releases · · 2 min read

b9235

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

llama : MTP clean-up (#23269)

  • llama : disable equal splits for recurrent memory with partial rollback

  • spec : re-enable p-min with MTP drafts

  • spec : re-enable ngram spec in combination with RS rollback

  • spec : fix ngram-map-* params

  • spec : fix acceptance logic in combined ngram + draft configs

  • graph : fix reuse for combined token + embd batches

  • spec : log parameters for each speculative implementation

  • add LOG_INF in each constructor with implementation type and parameters
  • extract device string logic into common_speculative_get_devices_str()
  • move 'adding speculative implementation' log from init into constructors

Assisted-by: llama.cpp:local pi

  • spec : extend --spec-default with ngram-map-k4v

Assisted-by: llama.cpp:local pi

  • minor : fix n_embd log

  • args : update draft.n_max == 3 + regen docs

  • spec : relax ngram-mod rejection thold to 0.25 @ 5 low

  • logs : improve

  • docs : update speculative decoding CLI argument documentation

  • Add missing draft model CPU scheduling and tensor override parameters
  • Update --spec-type to include all available types (excluding draft-eagle3 WIP)
  • Fix default values to match implementation (n_max=3, n_min=0, p_min=0.0)
  • Remove deprecated options (spec-draft-ctx-size, spec-draft-replace)
  • Add environment variables for new parameters

Assisted-by: llama.cpp:local pi

  • arg : step-back on adding k4v to the default spec config

  • cont : fix name

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases