The Eagle(3) has landed (for Qwen)
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
https://github.com/ggml-org/llama.cpp/releases/tag/b9723
Available in the latest release. Enabled via:
--spec-type draft-eagle3
You'll need to feed it a draft model. There's issues with unsloth + eagle at the moment so I've personally tested against:
Model: https://huggingface.co/lmstudio-community/Qwen3.6-27B-GGUF
Draft: https://huggingface.co/wimmmm/Ex0bit-Qwen3.6-27B-PRISM-EAGLE3-GGUF
Specify your draft with -md or --model-draft
Performance wise, I currently get very similar tps to draft-mtp. Also tensor parallelism isn't currently supported and asserts out, which I rely on a lot. The draft model will also eat a bit of vram, so not the best if you're running a very tight setup. I'll be keen to see how this develops in time!
Don't forget you can also stack up multiple types of speculative decoding:
--spec-type draft-eagle3,ngram-mod
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.