r/LocalLLaMA · · 1 min read

The Eagle(3) has landed (for Qwen)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

https://github.com/ggml-org/llama.cpp/releases/tag/b9723

Available in the latest release. Enabled via:

--spec-type draft-eagle3

You'll need to feed it a draft model. There's issues with unsloth + eagle at the moment so I've personally tested against:

Model: https://huggingface.co/lmstudio-community/Qwen3.6-27B-GGUF
Draft: https://huggingface.co/wimmmm/Ex0bit-Qwen3.6-27B-PRISM-EAGLE3-GGUF

Specify your draft with -md or --model-draft

Performance wise, I currently get very similar tps to draft-mtp. Also tensor parallelism isn't currently supported and asserts out, which I rely on a lot. The draft model will also eat a bit of vram, so not the best if you're running a very tight setup. I'll be keen to see how this develops in time!

Don't forget you can also stack up multiple types of speculative decoding:

--spec-type draft-eagle3,ngram-mod

submitted by /u/Legitimate-Dog5690
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA