r/LocalLLaMA · · 1 min read

MiniMax-M3-EAGLE3-GGUF - Llama.cpp compatible MiniMax M3 EAGLE draft model!

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Hi all!

With a new PR for llama.cpp, MiniMax M3's EAGLE decoder by Inferact/MiniMax-M3-EAGLE3 has successfully been converted to GGUF and runs without issue!

The HF repo has instructions for both merging in the PR and running the model. I tested this on a 2x3090 and 128GB DDR4 system running the UD-Q2_K_XL quant and went from 2.3 tk/s to 5 tk/s, thanks to --fit and ensuring the draft model was in VRAM instead of RAM.

It can be found here: https://huggingface.co/tonjum/MiniMax-M3-EAGLE3-GGUF

submitted by /u/maxwell321
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA