MiniMax-M3-EAGLE3-GGUF - Llama.cpp compatible MiniMax M3 EAGLE draft model!
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Hi all!
With a new PR for llama.cpp, MiniMax M3's EAGLE decoder by Inferact/MiniMax-M3-EAGLE3 has successfully been converted to GGUF and runs without issue!
The HF repo has instructions for both merging in the PR and running the model. I tested this on a 2x3090 and 128GB DDR4 system running the UD-Q2_K_XL quant and went from 2.3 tk/s to 5 tk/s, thanks to --fit and ensuring the draft model was in VRAM instead of RAM.
It can be found here: https://huggingface.co/tonjum/MiniMax-M3-EAGLE3-GGUF
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.